I met dr. rosa a. eberly, associate professor of rhetoric at Pennsylvania State University when she took my and Thomas Padilla's 'Collections as Data' course at the HILT summer school in 2018. When she got in touch to ask if I could contribute to a workshop on Harry Shearer's Le Show archive, of course I said yes! That event became the CAS 2023 Summer Symposium on Harry Shearer's "Le Show".
My slides for 'Resonating with different frequencies… Thoughts on public humanities through crowdsourcing in a ChatGPT world' are online at Zenodo. My planned talk notes are below.
Opening – I’m sorry I can’t be in the room today, not least because the programme lists so many interesting talks.
Today I wanted to think about the different ways that public humanities work through crowdsourcing still has a place in an AI-obsessed world… what happens if we think about different ways of ‘listening’ to an audio archive like Le Show, by people, by machines, and by people and machines in combination?
What visions can we create for a future in which people and machines tune into different frequencies, each doing what they do best?
- My work in crowdsourcing / data science in GLAMs
- What can machines do?
- The Le Show archive (as described by Rosa)
- Why do we still need people listening to Le Show and other audio archives?
My current challenge is working out the role of crowdsourcing when 'AI can do it all'…
Of course AI can't, but we need to articulate what people and what machines can do so that we can set up systems that align with our values.
If we leave it to the commercial sector and pure software guys, there’s a risk that people are regarded as part of the machine; or are replaced by AI rather than aided by AI.
[Then I did a general 'crowdsourcing and data science in cultural heritage / British Library / Living with Machines' bit]
Given developments in 'AI' (machine learning)… What can AI/data science do for audio?
- Transcribe speech for text-based search, methods
- Detect some concepts, entities, emotions –> metadata for findability
- Support 'distant reading'
–Shifts, motifs, patterns over time
–Collapse hours, years – take time out of the equation
- Machine listening?
–Use 'similarity' to find sonic (not text) matches?
[Description of the BBC World Archive experiments c 2012 combining crowdsourcing with early machine learning https://www.bbc.co.uk/blogs/researchanddevelopment/2012/11/the-world-service-archive-prot.shtml]
Le Show (as described by Rosa)
- A massive 'portal' of 'conceptual and sonic hyperlinks to late-20th- and early-21st-century news and culture'
- A 'polyphonic cornucopia of words and characters, lyrics and arguments, fact and folly'
- 'resistant to datafication'
- With koine topoi – issues of common or public concern
'Harry Shearer is a portal: Learn one thing from Le Show, and you’ll quickly learn half a dozen more by logical consequence'dr. rosa a. eberly
(Le Show reminds me of a time when news was designed to inform more than enrage.)
Why let machines have all the fun?
People can hear a richer range of emotions, topics and references, recognise impersonations and characters -> better metadata, findability
What can’t machines do? Software might be able to transcribe speech with pretty high accuracy, but it can't (reliably)… recognise humour, sarcasm, rhetorical flourishes, impersonations and characters – all the wonderful characteristics of the Le Show archive that Rosa described in her opening remarks yesterday. A lot of emotions aren’t covered in the ‘big 8’ that software tries to detect.
Software can recognise some subjects that e.g. have Wikipedia entries, but it’d also miss so much of what people can hear.
So, people can do a better job of telling us what's in the archive than computers can. Together, people and computers can help make specific moments more findable, creates metadata that could be used to visualise links between shows – by topic, by tone, music and more.
Could access to history in the raw, 'koine topoi' be a super-power?
Individual learning via crowdsourcing contributes to an informed, literate society
It's not all about the data. Crowdsourcing creates a platform and a reason for engagement. Your work helps others, but it also helps you.
I've shown some of my work with objects from the history of astronomy; playbills for 19th c British theatre performances, and most recently, newspaper articles from the long 19th c.
Through this work, I've come to believe that giving people access to original historical sources is one of the most important ways we can contribute to an informed, literate society.
A society that understands where we've come from, and what that means for where we're going.
A society that is less likely to fall for predictions of AI dooms or AI fantasies, because they've seen tech hype before.
A society that is less likely to believe that 'AI might take your job' because they know that the executives behind the curtain are the ones deciding whether AI helps workers or 'replaces' them.
I've worried about whether volunteers would be motivated to help transcribe audio or text, classify or tag images, when 'AI can do it'. But then I remembered that people still knit jumpers (sweaters) when they can buy them far more quickly and cheaply.
So, crowdsourcing still has a place. The trick is to find ways for 'AI' to aid people, not replace them. To figure out the boring bits and the bits that software is great at; so that people can spend more time on the fun bits.
Harry Shearer's ability to turn something into a topic, 'news of microplastics', of bees', is something of a super power. To amplify those messages is another gift, one the public can create by and for themselves.