I’ve been meaning to write this post since May 2022, when I was invited to present at a SCONUL event on ‘AI for libraries’. It’s hard to write anything about AI that doesn’t feel outdated before you hit ‘post’, especially since ChatGPT made generative AI suddenly accessible interesting to ‘ordinary’ people. Some of the content is now practically historical but I'm posting it partly because I liked their prompts, and it's always worth thinking about how quickly some things change while others are more constant.
Prompt 1. Which library AI projects (apart from your own) have most sparked your interest over recent years? Library of Congress 'Humans in the Loop: Accelerating access and discovery for digital collections initiative' experiments and recommendations for 'ethical, useful, and engaging' work https://labs.loc.gov/work/experiments/humans-loop/
Understanding visitor comments at scale – sentiment analysis of TripAdvisor reviews https://medium.com/@CuriousThirst/on-artificial-intelligence-museums-and-feelings-598b7ba8beb6
Various 'machines looking through documents' research projects, including those from Living with Machines – reading maps, labelling images, disambiguating place names, looking for change over time
Prompt 2. Which three things would you advise library colleagues to consider before embarking on an AI project?
Think about your people. How would AI fit into existing processes? Which jobs might it affect, and how? What information would help your audiences? Can AI actually reliably deliver it with the data you have available?
AI isn't magic. Understand the fundamentals. Learn enough to understand training and testing, accuracy and sources of bias in machine learning. Try tools like https://teachablemachine.withgoogle.com
Consider integration with existing systems. Where would machine-created metadata enhancements go? Is there a granularity gap between catalogue records and digitised content?
Prompt 3. What do you see as the role for information professionals in the world of AI? Advocate for audiences • Make the previously impossible, possible – and useful!
Advocate for ethics • Understand the implications of vendor claims – your money is a vote for their values • If it's creepy or wrong in person, it's creepy or wrong in an algorithm (?)
'To see a World in a Grain of Sand' • A single digitised item can be infinitely linked to places, people, concepts – how does this change 'discovery'?
A TL;DR is that it's incredible how many of the projects discussed wouldn't have been possible (or less feasible) a year ago. Whisper and ChatGPT (4, even more than 3.5) and many other new tools really have brought AI (machine learning) within reach. Also, the fact that I can *copy and paste text from a photo* is still astonishing. Some fantastic parts of the future are already here.
Other thinking aloud / reflections on themes from the event: the gap between experimentation and operationalisation for AI in GLAMs is still huge. Some folk are desperate to move onto operationalisation, others are enjoying the exploration phase – thinking about it, knowing where you and your organisation each stand on that could save a lot of frustration! Bridging it is possible, but it takes dedicated resources (including quality checking) from multi-disciplinary teams, and probably the goal has to be big and important enough to motivate all the work required. In examples discussed at FF2023, the scale of the backlog of collection items to be processed is that big important thing that motivates work with AI.
It didn't come up as directly, perhaps because many projects are still pilots rather than in production, but I'm very interested in the practical issues around including 'enriched' data from AI (or crowdsourcing) in GLAM collections management / cataloguing systems. We need records that can be enriched with transcriptions, keywords and other data iteratively over time, and that can record and display the provenance of that data – but can your collections systems do that?
Making LLMs stick to content in the item is hard, 'hallucinations' and loose interpretations of instructions are an issue. It's so useful hearing about things that didn't work or were hard to get right – common errors in different types of tools, etc. But who'd have thought that working with collections metadata would involve telling bedtime stories to convince LLMs to roleplay as an expert cataloguer?
Workflows are vital! So many projects have been assemblages of different machine learning / AI tools with some manual checking or correction.
A general theme in talks and chats was the temptation to lower 'quality' to be able to start to use ML/AI systems in production. People are keen to generate metadata with the imperfect tools we have now, but that runs into issues of trust for institutions expected to publish only gold standard, expert-created records. We need new conventions for displaying 'data in progress' alongside expert human records, and flexible workflows that allow for 'humans in the loop' to correct errors and biases.
If we are in an 'always already transitional' world where the work of migrating from one cataloguing standard or collections management tool to another is barely complete before it's time to move to the next format/platform, then investing in machine learning/AI tools that can reliably manage the process is worth it.
'Data ages like wine, software like fish' – but it used to take a few years for software to age, whereas now tools are outdated within a few months – how does this change how we think about 'infrastructure'? Looking ahead, people might want to re-run processes as tools improve (or break) over time, so they should be modular. Keep (and version) the data, don't expect the tool to be around forever.
I ran a workshop and went to two others the day before the conference proper began. I've put photos from the workshop I ran with Thomas Padilla (originally proposed with Nora McGregor and Silvia Gutiérrez De la Torre too) on Co-Creating an AI Responsive Information Literacy Curriculum workshop on Flickr. You can check out our workshop prompts and links to the 'AI literacy' curricula devised by participants.
Fantastic Futures Day 1
Thomas Mboa opens with a thought-provoking keynote. Is AI in GLAMs a Pharmakon (a purification ritual in ancient Greece where criminals were expelled)? Phamakon can mean both medicine and poison.
Can we ensure cultural integrity alone, from our ivory tower?
How can we involve data-providers communities without exploiting them?
Al feeds on data, which in turn conveys biases. How can we ensure the quality of data?
Cultural integrity is a measure of the wholeness or intactness of material, whether it respects and honours traditional ownership, traditions and knowledge
Thomas Mboa finishes with 'Some Key actions to ensure responsible use of Al in GLAM':
Develop Ethical Guidelines and Policies
Address Bias and Ensure Inclusivity
Enhance Privacy and Data Security
Balance Al with Human Expertise
Foster Digital Literacy and Skills Development
Promote Sustainable and Eco-friendly Practices
Encourage Collaboration and Community Engagement
Monitor and Evaluate Al Impact:
Intellectual Property and Copyright Considerations:
Preserve Authenticity and Integrity]
I shared lessons for libraries and AI from Living with Machines then there was a shared presentation on Responsible AI and governance – transparency/notice and clear explanations; risk management; ethics/discrimination, data protection and security.
Mike Trizna (and Rebecca Dikow) on the Smithsonian's AI values statement. Why We Need an Al Values Statement – everyone at the Smithsonian involved in data collection, creation, dissemination, and/or analysis is a stakeholder – Our goal is to aspirationally and proactively strive toward shared best practices across a distributed institution. All staff should feel like their expertise matters in decisions about technology.
From Bart Murphy (and Mary Sauer Games)'s talk it seems OCLC are really doing a good job operationalising AI to deduplicate catalogue entries at scale, maintaining quality and managing cost of cloud compute; also keeping ethics in mind.
Next, Abigail Potter and Laurie Allen, Introducing the LC Labs Artificial Intelligence Planning Framework. I love that LC Labs do the hard work of documenting and sharing the material they've produced to make experimentation, innovation and implementation of AI and new technologies possible in a very large library that's also a federal body.
Abby talked about their experiments with generating catalogue data from ebooks, co-led with their cataloguing department.
A panel discussed questions like: how do you think about "right sizing" your Al activities given your organizational capacity and constraints? How do you think about balancing R&D / experimentation with applying Al to production services / operations? How can we best work with the commercial sector? With researchers? What do you think the role of LAMs should be within the Al sector and society? How can we leverage each other as cultural heritage institutions?
I liked Stu Snydman's description of organising at Harvard to address AI with their values: embrace diverse perspectives, champion access, aim for the extraordinary, seek collaboration, lead with curiosity. And Ingrid Mason's description of NFSA's question about their 'social licence' (to experiment with AI) as an 'anchoring moment'. And there are so many reading groups!
Some of the final talks brought home how much more viable ChatGPT 4 has made some tasks, and included the first of two projects trying to work around the fact that people don't provide good metadata when depositing things in research archives.
Fantastic Futures Day 2
Day 2 begins with Mike Ridley on 'The Explainability Imperative' (for AI; XAI). We need trust and accountability because machine learning is consequential. It has an impact on our lives. Why isn't explainability the default?
His explainability priorities for LAM: HCXAI; Policy and regulation; Algorithmic literacy; Critical making.
Mike quotes 'Not everything that is important lies inside the black box of AI. Critical insights can lie outside it. Why? Because that's where the human are.' Ehsan and Riedl.
Mike – explanations should be actionable and contestable. They should enable reflection, not just acquiescence.
Algorithm literacy for LAMs – embed into information literacy programmes. Use algorithms with awareness. Create algorithms with integrity.
Policy and regulation for GLAMs – engage with policy and regulatory activities; insist on explainability as a core principle; promote an explanatory systems approach; champion the needs of the non-expert, lay person.
Critical making for GLAMs – build our own tools and systems; operationalise the principles of HCXAI; explore and interrogate for bias, misinformation and deception; optimise for social justice and equity
Mike quotes: "Technology designers are the new policymakers; we didn't elect them but their decisions determine the rules we live by." Latanya Sweeney (Harvard University, Director of the Public Interest Tech Lab)
Next: shorter talks on 'AI and collections management'. Jon Dunn and Emily Lynema shared work on AMP, an audiovisual metadata platform, built on https://usegalaxy.org/ for workflow management. (Someone mentioned https://airflow.apache.org/ yesterday – I'd love to know more about GLAMs experiences with these workflow tools for machine learning / AI)
Nice 'AI explorer' from Harvard Art Museums https://ai.harvardartmuseums.org/search/elephant presented by Jeff Steward. It's a really nice way of seeing art through the eyes of different image tagging / labelling services like Imagga, Amazon, Clarifai, Microsoft.
(An example I found: https://ai.harvardartmuseums.org/object/228608. Showing predicted tags like this is a good step towards AI literacy, and might provide an interesting basis for AI explainability as discussed earlier.)
Scott Young and Jason Clark (Montana State University) shared work on Responsible AI at Montana State University. And a nice quote from Kate Zwaard, 'Through the slow and careful adoption of tech, the library can be a leader'. They're doing 'irresponsible AI scenarios' – a bit like a project pre-mortem with a specific scenario e.g. lack of resources.
Emmanuel A. Oduagwu from the Department of Library & Information Science, Federal Polytechnic, Nigeria, calls for realistic and sustainable collaborations between developing countries – library professionals need technical skills to integrate AI tools into library service delivery; they can't work in isolation from ICT. How can other nations help?
Finally, Leo Lo and Cynthia Hudson Vitale presented draft guiding principles from the US Association of Research Libraries (ARA). Points include the need to include human review; prioritise the safety and privacy of employees and users; prioritise inclusivity; democratise access to AI and be environmentally responsible.
Technology job listings for cultural heritage or the humanities aren't always easy to find. I've recently been helping recruit into various tech roles at the British Library while also answering questions from folk looking for work in the digital heritage / GLAM (galleries, libraries, archives, museums) tech / digital humanities (DH)-ish world, so I've collated some notes on where to look for job ads. Plus, some bonus thoughts on preparing for a job search and applying for jobs.
Preparing for a job search / post
If you're looking for work, setting up alerts or subscribing to various job sites can give you a sense of what's out there, and the skills and language you'd want to include in your CV or portfolio. If you're going to advertise vacancies, it helps to get a sense of how others describe their jobs.
Lurking on slacks and mailing lists gives you exposure to local jargon. Even better, if you can post occasionally to help someone with a question, as people might recognise your name later. Events – meetups, conferences, seminars, etc – can be good for meeting people and learning more about a sector. Serendipitous casual chats are easier in-person, but online events are more accessible.
Applying for GLAM/DH jobs
You probably know this, but sometimes a reminder helps… it's often worth applying for a job where you have most, but not all of the required skills. Job profiles are often wish lists rather than complete specs. That said, pay attention to the language used around different 'essential' vs 'desirable' requirements as that can save you some time.
Please, please pay attention to the questions asked during the application process and figure out (or ask) how they're shortlisting based on the questions they ask. At the BL we can only shortlist with information that applicants provide in response to questions on the application. In other places, reflecting the language and specific requirements in the job ad and profile in your cover letter matters more. And I'm sorry if you've spent ages on it, but never assume that people can see your CV during the shortlisting or interview process.
If you see a technology or method that you haven't tried, getting familiar with it before an interview can take you a long way. Download and try it, watch videos, whatever – showing willing and being able to relate it to your stronger skills helps.
The UK GLAM sector tends not to be able to offer visa sponsorship, but remote contracts may be possible. Always read the fine print…
Translate job descriptions and profiles to help candidates understand your vacancy
Updating to add: public organisations often have obscure job titles and descriptions. You can help translate jargon and public / charity / arts / academic sector speak into something closer to the language potential candidates might understand by writing blogposts and social media / discussion list messages that explain what the job actually involves, why it exists, and what a typical day or week might look like.
Cultural Heritage/Digital Humanities Slacks – most of these have jobs channels
My favourite analogy for AI / machine learning-based tools[1] is that they’re like working with a child. They can spin a great story, but you wouldn’t bet your job on it being accurate. They can do tasks like sorting and labelling images, but as they absorb models of the world from the adults around them you’d want to check that they haven’t mistakenly learnt things like ‘nurses are women and doctors are men’.
Libraries and other GLAMs have been working with machine learning-based tools for a number of years, cumulatively gathering evidence for what works, what doesn’t, and what it might mean for our work. AI can scale up tasks like transcription, translation, classification, entity recognition and summarisation quickly – but it shouldn’t be used without supervision if the answer to the question ‘does it matter if the output is true?’ is ‘yes’.[2] Training a model and checking the results of an external model both require resources and expertise that may be scarce in GLAMs.
But the thing about toddlers is that they’re cute and fun to play with. By the start of 2023, ‘generative AI’ tools like the text-to-image tool DALL·E 2 and large language models (LLMs) like ChatGPT captured the public imagination. You’ve probably heard examples of people using LLMs as everything from an oracle (‘give me arguments for and against remodelling our kitchen’) to a tutor (‘explain this concept to me’) to a creative spark for getting started with writing code or a piece of text. If you don’t have an AI strategy already, you’re going to need one soon.
The other thing about toddlers is that they grow up fast. GLAMs have an opportunity to help influence the types of teenagers then adults they become – but we need to be proactive if we want AI that produces trustworthy results and doesn’t create further biases. Improving AI literacy within the GLAM sector is an important part of being able to make good choices about the technologies we give our money and attention to. (The same is also true for our societies as a whole, of course).
Since the 2017 summit, I’ve found myself thinking about ‘collections as data’ in two ways.[3] One is the digitised collections records (from metadata through to full page or object scans) that we share with researchers interested in studying particular topics, formats or methods; the other is the data that GLAMs themselves could generate about their collections to make them more discoverable and better connected to other collections. The development of specialist methods within computer vision and natural language processing has promise for both sorts of ‘collections as data’,[4] but we still have much to learn about the logistical, legal, cultural and training challenges in aligning the needs of researchers and GLAMs.
The buzz around AI and the hunger for more material to feed into models has introduced a third – collections as training data. Libraries hold vast repositories of historical and contemporary collections that reflect both the best thinking and the worst biases of the society that produced them. What is their role in responsibly and ethically stewarding those collections into training data (or not)?
As we learn more about the different ‘modes of interaction’ with AI-based tools, from the ‘text-grounded’, ‘knowledge-seeking’ and ‘creative’,[5] and collect examples of researchers and institutions using tools like large language models to create structured data from text,[6] we’re better able to understand and advocate for the role that AI might play in library work. Through collaborations within the Living with Machines project, I’ve seen how we could combine crowdsourcing and machine learning to clear copyright for orphan works at scale; improve metadata and full text searches with word vectors that help people match keywords to concepts rather than literal strings; disambiguate historical place names and turn symbols on maps into computational information.
Our challenge now is to work together with the Silicon Valley companies that shape so much of what AI ‘knows’ about the world, with the communities and individuals that created the collections we care for, and with the wider GLAM sector to ensure that we get the best AI tools possible.
[1] I’m going to use ‘AI’ as a shorthand for ‘AI and machine learning’ throughout, as machine learning models are the most practical applications of AI-type technologies at present. I’m excluding ‘artificial general intelligence’ for now.
[2] Tiulkanov, “Is It Safe to Use ChatGPT for Your Task?”
[3] Much of this thinking is informed by the Living with Machines project, a mere twinkle in the eye during the first summit. Launched in late 2018, the project aims to devise new methods, tools and software in data science and artificial intelligence that can be applied to historical resources. A key goal for the Library was to understand and develop some solutions for the practical, intellectual, logistical and copyright challenges in collaborative research with digitised collections at scale. As the project draws to an end five and a half years later, I’ve been reflecting on lessons learnt from our work with AI, and on the dramatic improvements in machine learning tools and methods since the project began.
Opening – I’m sorry I can’t be in the room today, not least because the programme lists so many interesting talks.
Today I wanted to think about the different ways that public humanities work through crowdsourcing still has a place in an AI-obsessed world… what happens if we think about different ways of ‘listening’ to an audio archive like Le Show, by people, by machines, and by people and machines in combination?
What visions can we create for a future in which people and machines tune into different frequencies, each doing what they do best?
Overview
My work in crowdsourcing / data science in GLAMs
What can machines do?
The Le Show archive (as described by Rosa)
Why do we still need people listening to Le Show and other audio archives?
My current challenge is working out the role of crowdsourcing when 'AI can do it all'…
Of course AI can't, but we need to articulate what people and what machines can do so that we can set up systems that align with our values.
If we leave it to the commercial sector and pure software guys, there’s a risk that people are regarded as part of the machine; or are replaced by AI rather than aided by AI.
[Then I did a general 'crowdsourcing and data science in cultural heritage / British Library / Living with Machines' bit]
Given developments in 'AI' (machine learning)… What can AI/data science do for audio?
Transcribe speech for text-based search, methods
Detect some concepts, entities, emotions –> metadata for findability
Support 'distant reading'
–Shifts, motifs, patterns over time
–Collapse hours, years – take time out of the equation
Machine listening?
–Use 'similarity' to find sonic (not text) matches?
A massive 'portal' of 'conceptual and sonic hyperlinks to late-20th- and early-21st-century news and culture'
A 'polyphonic cornucopia of words and characters, lyrics and arguments, fact and folly'
'resistant to datafication'
With koine topoi – issues of common or public concern
'Harry Shearer is a portal: Learn one thing from Le Show, and you’ll quickly learn half a dozen more by logical consequence'
dr. rosa a. eberly
(Le Show reminds me of a time when news was designed to inform more than enrage.)
Why let machines have all the fun?
People can hear a richer range of emotions, topics and references, recognise impersonations and characters -> better metadata, findability
What can’t machines do? Software might be able to transcribe speech with pretty high accuracy, but it can't (reliably)… recognise humour, sarcasm, rhetorical flourishes, impersonations and characters – all the wonderful characteristics of the Le Show archive that Rosa described in her opening remarks yesterday. A lot of emotions aren’t covered in the ‘big 8’ that software tries to detect.
Software can recognise some subjects that e.g. have Wikipedia entries, but it’d also miss so much of what people can hear.
So, people can do a better job of telling us what's in the archive than computers can. Together, people and computers can help make specific moments more findable, creates metadata that could be used to visualise links between shows – by topic, by tone, music and more.
Could access to history in the raw, 'koine topoi' be a super-power?
Individual learning via crowdsourcing contributes to an informed, literate society
It's not all about the data. Crowdsourcing creates a platform and a reason for engagement. Your work helps others, but it also helps you.
I've shown some of my work with objects from the history of astronomy; playbills for 19th c British theatre performances, and most recently, newspaper articles from the long 19th c.
Through this work, I've come to believe that giving people access to original historical sources is one of the most important ways we can contribute to an informed, literate society.
A society that understands where we've come from, and what that means for where we're going.
A society that is less likely to fall for predictions of AI dooms or AI fantasies, because they've seen tech hype before.
A society that is less likely to believe that 'AI might take your job' because they know that the executives behind the curtain are the ones deciding whether AI helps workers or 'replaces' them.
I've worried about whether volunteers would be motivated to help transcribe audio or text, classify or tag images, when 'AI can do it'. But then I remembered that people still knit jumpers (sweaters) when they can buy them far more quickly and cheaply.
So, crowdsourcing still has a place. The trick is to find ways for 'AI' to aid people, not replace them. To figure out the boring bits and the bits that software is great at; so that people can spend more time on the fun bits.
Harry Shearer's ability to turn something into a topic, 'news of microplastics', of bees', is something of a super power. To amplify those messages is another gift, one the public can create by and for themselves.
As I'm speaking today at an event that's mostly in French, I'm sharing my slides outline so it can be viewed at leisure, or copy-and-pasted into a translation tool like Google Translate.
Colloque de clôture du projet Testaments de Poilus, Les Archives nationales de France, 25 Novembre 2022
Crowdsourcing as connection: a constant star over a sea of change, Mia Ridge, British Library
GLAM values as a guiding star
(Or, how will AI change crowdsourcing?) My argument is that technology is changing rapidly around us, but our skills in connecting people and collections are as relevant as ever:
Crowdsourcing connects people and collections
AI is changing GLAM work
But the values we express through crowdsourcing can light the way forward
(GLAM – galleries, libraries, archives and museums)
A sea of change
AI-based tools can now do many crowdsourced tasks:
Transcribe audio; typed and handwritten text
Classify / label images and text – objects, concepts, 'emotions'
AI-based tools can also generate new images, text
Deep fakes, emerging formats – collecting and preservation challenges
AI is still work-in-progress
Automatic transcription, translation failure from this morning: 'the encephalogram is no longer the mother of weeks'
Results have many biases; cannot be used alone
White, Western, 21st century view
Carbon footprint
Expertise and resources required
Not easily integrated with GLAM workflows
Why bother with crowdsourcing if AI will soon be 'good enough'?
The elephant in the room; been on my mind for a couple of years now
The rise of AI means we have to think about the role of crowdsourcing in cultural heritage. Why bother if software can do it all?
Crowdsourcing brings collections to life
Close, engaged attention to 'obscure' collection items
Opportunities for lifelong learning; historical and scientific literacy
Gathers diverse perspectives, knowledge
Crowdsourcing as connection
Crowdsourcing in GLAMs is valuable in part because it creates connections around people and collections
Between volunteers and staff
Between people and collections
Between collections
Examples from the British Library
In the Spotlight: designing for productivity and engagement
Living with Machines: designing crowdsourcing projects in collaboration with data scientists that attempt to both engage the public with our research and generate research datasets. Participant comments and questions inspired new tasks, shaped our work.
How do we follow the star?
Bringing 'crowdsourcing as connection' into work with AI
Valuing 'crowdsourcing as connection'
Efficiency isn't everything. Participation is part of our mission
Help technologists and researchers understand the value in connecting people with collections
Develop mutual understanding of different types of data – editions, enhancement, transcription, annotation
Perfection isn't everything – help GLAM staff define 'data quality' in different contexts
Where is imperfect, AI data at scale more useful than perfect but limited data?
'réinjectée' – when, where, and how?
How does crowdsourcing, AI change work for staff?
How do we integrate data from different sources (AI, crowdsourcing, cataloguers), at different scales, into coherent systems?
How do interfaces show data provenance, confidence?
Transforming access, discovery, use
A single digitised item can be infinitely linked to places, people, concepts – how does this change 'discovery'?
What other user needs can we meet through a combination of AI, better data systems and public participation?
I had a strict five minute slot for my talk in the panel on 'Reimagining the past with AI' at Turing's AI UK event today, so wrote out my notes and thought I might as well share them…
The panel blurb was 'The past shapes the present and influences the future, but the historical record isn’t straightforward, and neither are its digital representations. Join the AHRC project Living with Machines and friends on their journey to reimagine the past through AI and data science and the challenges and opportunities within.' It was a delight to chat with Dave Beavan, Mariona Coll Ardanuy, Melodee Wood and Tim Hitchcock.
My prepared talk: A bit about the British Library for those who aren't familiar with it. It's one of the two biggest libraries in the world, and it’s the national library for the UK.
Its collections are vast – somewhere between 180 and 200 million collection items, including 14 million books; hundreds of terrabytes of archived websites; over 600,000 bound volumes of historical newspapers, of which about 60 million pages have been digitised with partners FindMyPast so far)…
We've been working with crowdsourcing – which we defined as working with the public on tasks that contribute to a shared, significant goal related to cultural heritage collections or knowledge – for about a decade now. We've collected local sounds and accents around Britain, georeferenced gorgeous historical maps, matched card catalogue records in Urdu and Chinese to digital catalogue records, and brought the history of theatre across the UK to life via old playbills.
Some of our crowdsourcing work is designed to help improve the discoverability of cultural heritage collections, and some, like our work with Living with Machines, is designed to build datasets to help answer wider research questions.
In all cases, our work with crowdsourcing is closely aligned with the BL's mission: it helps make our shared intellectual heritage available for research, inspiration and enjoyment.
We think of crowdsourcing activities as a form of digital volunteering, where participation in the task is rewarding in its own right. Our crowdsourcing projects are a platform for privileged access and deeper engagement with our digitised collections. They're an avenue for people who wouldn't normally encounter historical records close up to work with them, while helping make those items easier for others to access.
Through Living with Machines, we've worked out how to design tasks that fit into computational linguistic research questions and timelines…
So that's all great – but… the scale of our collections is hard to ignore. Individual crowdsourcing tasks that make items more accessible by transcribing or classifying items are beyond the capacity of even the keenest crowd. Enter machine learning, human computation, human in the loop…
While we're keen to start building systems that combine machine learning and human input to help scale up our work, we don't want to buy into terms like 'crowdworkers' or ‘gig work’ that we see in some academic and commercial work. If crowdsourcing is a form of public engagement, as well as a productive platform for tasks, we can't think of our volunteers as 'cogs' in a system.
We think that it's important to help shape the future of 'human computation' systems; to ensure that work on machine learning / AI are in alignment with Library values . We look to work that peers at the Library of Congress are doing to create human-in-the-loop systems that 'cultivate responsible practices'.
We want to retain the opportunities for the public to get started with simpler tasks based on historical collections, while also being careful not to 'waste clicks' by having people do tasks that computers can do faster.
With Living with Machines, we've built tasks that provide opportunities for participants to think about how their classifications form training datasets for machine learning.
So my questions for the next year are: how can we design human computation systems that help participants acquire new literacies and skills, while scaling up and amplifying their work?
I’ve just spent Monday and Tuesday in New York for a workshop on ‘Museums + AI’. Funded by the AHRC and led by Oonagh Murphy and Elena Villaespesa, this was the second workshop in the year-long project.
As there’s so much interest in artificial intelligence /
machine learning / data science right now, I thought I’d revive the lost art of
event blogging and share my notes. These notes are inevitably patchy, so keep
an eye out for more formal reports from the team. I’ve used ‘museum’
throughout, as in the title of the event, but many of these issues are relevant
to other collecting institutions (libraries, archives) and public venues. I’m
writing this on the Amtrak to DC so I’ve been lazy about embedding links in
text – sorry!
After a welcome from Pratt (check out their student blog https://museumsdigitalculture.prattsi.org/), Elena’s opening remarks introduced the two themes of the workshop: AI + visitor data and AI + Collections data. Questions about visitor data include whether museums have the necessary data governance and processes in place; whether current ethical codes and regulations are adequate for AI; and what skills staff might need to gain visitor insights with AI. Questions about collections data include how museums can minimise algorithmic biases when interpreting collections; whether the lack of diversity in both museum and AI staff would be reflected in the results; and the implications of museums engaging with big tech companies.
Achim Koh’s talk raised many questions I’ve had as we’ve
thought about AI / machine learning in the library, including how staff
traditionally invested with the authority to talk about collections (curators,
cataloguers) would feel about machines taking on some of that work. I think
we’ve broadly moved past that at the library if we can assume that we’d work
within systems that can distinguish between ‘gold standard’ records created by
trained staff and those created by software (with crowdsourced data somewhere
inbetween, depending on the project).
John Stack and Jamie Unwin from the (UK) Science Museum shared some the challenges of using pre-built commercial
models (AWS Rekognition and Comprehend) on museum collections – anything long and thin is marked as a
'weapon' – and demonstrated a nice tool for seeing 'what the machine saw' https://johnstack.github.io/what-the-machine-saw/.
They don’t currently show machine-generated tags to users, but they’re used
behind-the-scenes for discoverability. Do we need more transparency about how
search results were generated – but will machine tags ever be completely safe
to show people without vetting, even if confidence scores and software versions
are included with the tags?
Andrew Lih talked about image classification work with the Metropolitan Museum and Wikidata which picked up on the issue of questionable tags. Wikidata has a game-based workflow for tagging items, which in addition to tools for managing vandalism or miscreants allows them to trust the ‘crowd’ and make edits live immediately. Being able to sift incorrect from correct tags is vital – but this in turn raises questions of ‘round tripping’ – should a cultural institution ingest the corrections? (I noticed this issue coming up a few times because it’s something we’ve been thinking about as we work with a volunteer creating Wikidata that will later be editable by anyone.) Andrew said that the Met project put AI more firmly into the Wikimedia ecosystem, and that more is likely to come. He closed by demonstrating how the data created could put collections in the centre of networks of information http://w.wiki/6Bf Keep an eye out for the Wiki Art Depiction Explorer https://docs.google.com/presentation/d/1H87K5yjlNNivv44vHedk9xAWwyp9CF9-s0lojta5Us4/edit#slide=id.g34b27a5b18_0_435
Jeff Steward from Harvard Art Museums gave a thoughtful talk
about how different image tagging and captioning tools (Google Vision, Imagga,
Clarifai, Microsoft Cognitive Services) saw the collections, e.g. Imagga might
talk about how fruit depicted in a painting tastes: sweet, juicy; how a bowl is
used: breakfast, celebration. Microsoft tagger and caption tools have different
views, don’t draw on each other.
Chris Alen Sula led a great session on ‘Ethical
Considerations for AI’.
That evening, we went to an event at the Cooper Hewitt for more discussion of https://twitter.com/hashtag/MuseumsAI and the launch of their Interaction Lab https://www.cooperhewitt.org/interaction-lab/ Andrea Lipps and Harrison Pim’s talks reminded me of earlier discussion about holding cultural institutions to account for the decisions they make about AI, surveillance capitalism and more. Workshops like this (and the resulting frameworks) can provide the questions but senior staff must actually ask them, and pay attention to the answers. Karen Palmer’s talk got me thinking about what ‘democratising AI’ really means, and whether it’s possible to democratise something that relies on training data and access to computing power. Democratising knowledge about AI is a definite good, but should we also think about alternatives to AI that don’t involve classifications, and aren’t so closely linked to surveillance capitalism and ad tech?
The next day began with an inspiring talk from Effie Kapsalis on the Smithsonian Institution’s American Women’s History Initiative https://womenshistory.si.edu/ They’re thinking about machine learning and collections as data to develop ethical guidelines for AI and gender, analysing representations of women in multidisciplinary collections, enhancing data at scale and infusing the web with semantic data on historical women.
Shannon Darrough, MoMA, talked about a machine learning
project with Google Arts and Culture to identify artworks in 30,000
installation photos, based on 70,000 collection images https://moma.org/calendar/exhibitions/history/identifying-art
It was great at 2D works, not so much 3D, installation, moving image or
performance art works. The project worked because they identified a clear
problem that machine learning could solve. His talk led to discussion about
sharing training models (i.e. once software is trained to specialise in
particular subjects, others can re-use the ‘models’ that are created), and the
alignment between tech companies’ goals (generally, shorter-term,
self-contained) and museums’ (longer-term, feeding into core systems).
I have fewer notes from talks by Lawrence Swiader (American Battlefield Trust) with good advice on human-centred processes, Juhee Park (V&A) on frameworks for thinking about AI and museums, Matthew Cock (VocalEyes) on chat bots for venue accessibility information, and Carolyn Royston and Rachel Ginsberg (on the Cooper Hewitt’s Interaction Lab), but they added to the richness of the day. My talk was on ‘operationalising AI at a national library’, my slides are online https://www.slideshare.net/miaridge/operationalising-ai-at-a-national-library The final activity was on ‘managing AI’, a subject that’s become close to my heart.
Before we start: in the spirit of the mid-2000s, I thought I'd have a go at blogging about events again. I've realised I miss the way that blogging and reading other people's posts from events made me feel part of a distributed community of fellow travellers. Journal articles don't have the same effect (they're too long and jargony for leisure readers, assuming they're accessible outside universities at all), and tweets are great for connecting with people, but they're very ephemeral. Here goes…
On September 3 I was at BBC Broadcasting House for 'AI, Society & the Media: How can we Flourish in the Age of AI?' by BBC, LCFI and The Alan Turing Institute. Artificial intelligence is a hot topic so it was a sell-out event. My notes are very partial (in both senses of the word), and please do let me know if there are errors. The event hashtag will provide more coverage: https://twitter.com/hashtag/howcanweflourish.
The first session was 'AI – What you need to know!'. Matthew Postgate began by providing context for the BBC's interest in AI. 'We need a plurality of business models for AI – not just ad-funded' – yes! The need for different models for AI (and related subjects like machine learning) was a theme that recurred throughout the day (and at other events I was at this week).
Adrian Weller spoke on the limitations of AI. It's data hungry, compute intensive, poor at representing uncertainty, easily fooled by adversarial examples (and more that I missed). We need sensible measures of trustworthiness including robustness, fairness, protection of privacy, transparency.
Been Kim shared Google's AI principles: https://ai.google/principles She's focused on interpretability – goals are to ensure that our values are aligned and our knowledge is reflected. She emphasised the need to understand your data (another theme across the day and other events this week). You can an inherently interpretable machine model (so it can explain its reasoning) or can build an interpreter, enabling conversations between humans and machines. You can then uncover bias using the interpreter, asking what weight it gave to different aspects in making decisions.
Jonnie Penn (who won me with an early shout out to the work of Jon Agar) asked, from where does AI draw its authority? AI is feeding a monopoly of Google-Amazon-Facebook who control majority of internet traffic and advertising spend. Power lies in choosing what to optimise for, and choosing what not to do (a tragically poor paraphrase of his example of advertising to children, but you get the idea). We need 'bureaucratic biodiversity' – need lots of models of diverse systems to avoid calcification.
Kate Coughlan – only 10% of people feel they can influence AI. They looked at media narratives re AI on axes of time (ease vs obsolescence), power (domination vs uprising), desire (gratification vs alienation), life (immortality vs inhumanity). Their survey found that each aspect was equally disempowering. Passivity drives negative outcomes re feelings about change, tech – but if people have agency, then it's different. We need to empower citizens to have active role in shaping AI.
The next session was 'Fake News, Real Problems: How AI both builds and destroys trust in news'. Ryan Fox spoke on 'manufactured consensus' – we're hardwired to agree with our community so you can manipulate opinion by making it look like everyone else thinks a certain way. Manipulating consensus is currently legal, though against social network T&S. 'Viral false narratives can jeopardise brand trust and integrity in an instant'. Manufactured outrage campaigns etc. They're working on detecting inorganic behaviour through the noise – it's rapid, repetitive, sticky, emotional (missed some).
One of the panel questions was, would AI replace journalists? No, it's more like having lots of interns – you wouldn't have them write articles. AI is good for tasks you can explain to a smart 16 year old in the office for a day. The problematic ad-based model came up again – who is the arbiter of truth (e.g. fake news on Facebook). Who's paying for those services and what power does it give them?
This panel made me think about discussions about machine learning and AI at work. There are so many technical, contextual and ethical challenges for collecting institutions in AI, from capturing the output of an interactive voice experience with Alexa, to understanding and recording the difference between Russia Today as a broadcast news channel and as a manipulator of YouTube rankings.
Next was a panel on 'AI as a Creative Enabler'. Cassian Harrison spoke about 'Made By Machine', an experiment with AI and archive programming. They used scene detection, subtitle analysis, visual 'energy', machine learning on the BBC's Redux archive of programmes. Programmes were ranked by how BBC4 they were; split into sections then edited down to create mini BBC4 programmes.
Kanta Dihal and Stephen Cave asked why AI fascinates us in a thoughtful presentation. It's between dead and alive, uncanny (and lots more but clearly my post-lunch notetaking isn't the best).
Anna Ridler and Amy Cutler have created an AI-scripted nature documentary (trained on and re-purposing a range of tropes and footage from romance novels and nature documentaries) and gave a brilliant presentation about AI as a medium and as a process. Anna calls herself a dataset artist, rather than a machine learning artist. You need to get to know the dataset, look out for biases and mistakes, understand the humanness of decisions about what was included or excluded. Machines enact distorted versions of language.
I don't have notes from 'Next Gen AI: How can the next generation flourish in the age of AI?' but it was great to hear about hackathons where teenagers could try applying AI. The final session was 'The Conditions for Flourishing: How to increase citizen agency and social value'. Hannah Fry – once something is dressed up as an algorithm it gains some authority that's hard to question. Diane Coyle talked about 'general purpose technologies', which transform one industry then others. Printing, steam, electricity, internal combustion engine, digital computing, AI. Her 'lessons for the era of AI' were: all technology is social; all technologies are disruptive and have unpredictable consequences; all successful technologies enhance human freedoms', and accordingly she suggested we 'think in systems; plan for change; be optimistic'.
Konstantinos Karachalios called for a show of hands re who feels they have control over their data and what's done with it? Very few hands were raised. 'If we don't act now we'll lose our agency'.
I'm going to give the final word to Terah Lyons as the key takeaway from the day: 'technology is not destiny'.
I didn't hear a solution to the problems of 'fake news' that doesn't require work from all of us. If we don't want technology to be destiny, we all need pay attention to the applications of AI in our lives, and be prepared to demand better governance and accountability from private and government agents.
(A bonus 'question I didn't ask' for those who've read this far: how do BBC aims for ethical AI relate to the introduction compulsory registration to access tv and radio? If I turn on the radio in my kitchen, my listening habits aren't tracked; if I listen via the app they're linked to my personal ID).
I've developed this exercise on computational data generation and entity extraction for various information/data visualisation workshops I've been teaching lately. These exercises help demonstrate the biases embedded in machine learning and 'AI' tools. As these methods have become more accessible, my dataviz workshops have included more discussion of computational methods for generating data to be visualised. There are two versions of the exercise – the first works with images, the second with text.
In teaching I've found that services that describe images were more accessible and generated richer discussion in class than text-based sites, but it's handy to have the option for people who work with text. If you try something like this in your classes I'd love to hear from you.
It's also a chance to talk about the uses of these technologies in categorising and labelling our posts on social media. We can tell people that their social media posts are analysed for personality traits and mentions of brands, but seeing it in action is much more powerful.
Image exercise: trying computational data generation and entity extraction
Time: c. 5 – 10 minutes plus discussion.
Goal: explore methods for extracting information from text or an image and reflect on what the results tell you about the algorithms
1. Find a sample image
Find an image (e.g. from a news site or digitised text) you can download and drag into the window. It may be most convenient to save a copy to your desktop. Many sites let you load images from a URL, so right- or control-clicking to copy an image location for pasting into the site can be useful.
2. Work in your browser
It's probably easiest to open each of these links in a new browser window. It's best to use Firefox or Chrome, if you can. Safari and Internet Explorer may behave slightly differently on some sites. You should not need to register to use these sites – please read the tips below or ask for help if you get stuck.
Clarifai https://www.clarifai.com/demo – you can drag and drop, open the file explorer to find an image, or load one from a URL via the large '+' in the bottom right-hand corner. You can adjust settings via the 'Configure' tab.
Google Cloud Vision API https://cloud.google.com/vision/ – don't sign up, scroll down to the 'Try the API' box. Drag and drop your image on the box or click the box to open the file finder. You may need to go through the 'I am not a robot' process.
IBM Watson Visual Recognition https://visual-recognition-demo.mybluemix.net/ – scroll to 'Try the service'. Drag an image onto the grey box or click in the grey box to open the file finder. You can also load an image directly from a URL. (You can no longer try this without signing up so it doesn't work for a quick exercise).
Make notes, or discuss with your neighbour. Be prepared to report back to the group.
What attributes does each tool report on?
Which attributes, if any, were unique to a service?
Based on this, what do companies like Clarifai, Google, IBM and Microsoft seem to think is important to them (or to their users)? (e.g. what does 'safe for work' really mean?)
Who are their users – the public or platform administrators?
How many of possible entities (concepts, people, places, events, references to time or dates, etc) did it pick up?
Is any of the information presented useful?
Did it label anything incorrectly?
What options for exporting or saving the results did the demo offer? What about the underlying service or software?
For tools with configuration options – what could you configure? What difference did changing classifiers or other parameters make?
If you tried it with a few images, did it do better with some than others? Why might that be?
Text exercise: trying computational data generation and entity extraction
Time: c. 5 minutes plus discussion
Goal: explore the impact of source data and algorithms on input text
1.Grab some text
You will need some text for this exercise. The more 'entities' – people, places, dates, concepts – discussed, the better. If you have some text you're working on handy, you can use that. If you're stuck for inspiration, pick a front page story from an online news site. Keep the page open so you can copy a section of text to paste into the websites.
2.Compare text entity labelling websites
Open four or more browser windows or tabs. Open the links below in separate tabs or windows so you can easily compare the results.
How many possible entities (concepts, people, places, events, references to time or dates, etc) did each tool pick up? Is any of the other information presented useful?
Did it label anything incorrectly?
What if you change classifiers or other parameters?
Does it do better with different source material?
What differences did you find between the two tools? What do you think caused those differences?
How much can you find out about the tools and the algorithms they use to create labels?
Where does the data underlying the process come from?
Spoiler alert!
.@mia_out: "According to image recognition software, the world can be divided into safe for work & not safe for work" #beyondtheblackbox