Fantastic Futures 2023 – AI4LAM in Vancouver

Reflections and selected highlights from the Fantastic Futures 2023 conference, held at the Internet Archive Canada's building in Vancouver and Simon Fraser University; full programme; videos will be coming soon.

A TL;DR is that it's incredible how many of the projects discussed wouldn't have been possible (or less feasible) a year ago. Whisper and ChatGPT (4, even more than 3.5) and many other new tools really have brought AI (machine learning) within reach. Also, the fact that I can *copy and paste text from a photo* is still astonishing. Some fantastic parts of the future are already here.

Other thinking aloud / reflections on themes from the event: the gap between experimentation and operationalisation for AI in GLAMs is still huge. Some folk are desperate to move onto operationalisation, others are enjoying the exploration phase – thinking about it, knowing where you and your organisation each stand on that could save a lot of frustration! Bridging it is possible, but it takes dedicated resources (including quality checking) from multi-disciplinary teams, and probably the goal has to be big and important enough to motivate all the work required. In examples discussed at FF2023, the scale of the backlog of collection items to be processed is that big important thing that motivates work with AI.

It didn't come up as directly, perhaps because many projects are still pilots rather than in production, but I'm very interested in the practical issues around including 'enriched' data from AI (or crowdsourcing) in GLAM collections management / cataloguing systems. We need records that can be enriched with transcriptions, keywords and other data iteratively over time, and that can record and display the provenance of that data – but can your collections systems do that?

Making LLMs stick to content in the item is hard, 'hallucinations' and loose interpretations of instructions are an issue. It's so useful hearing about things that didn't work or were hard to get right – common errors in different types of tools, etc. But who'd have thought that working with collections metadata would involve telling bedtime stories to convince LLMs to roleplay as an expert cataloguer?

Workflows are vital! So many projects have been assemblages of different machine learning / AI tools with some manual checking or correction.

A general theme in talks and chats was the temptation to lower 'quality' to be able to start to use ML/AI systems in production. People are keen to generate metadata with the imperfect tools we have now, but that runs into issues of trust for institutions expected to publish only gold standard, expert-created records. We need new conventions for displaying 'data in progress' alongside expert human records, and flexible workflows that allow for 'humans in the loop' to correct errors and biases.

If we are in an 'always already transitional' world where the work of migrating from one cataloguing standard or collections management tool to another is barely complete before it's time to move to the next format/platform, then investing in machine learning/AI tools that can reliably manage the process is worth it.

'Data ages like wine, software like fish' – but it used to take a few years for software to age, whereas now tools are outdated within a few months – how does this change how we think about 'infrastructure'? Looking ahead, people might want to re-run processes as tools improve (or break) over time, so they should be modular. Keep (and version) the data, don't expect the tool to be around forever.

Update to add: Francesco Ramigni posted on ACMI at Fantastic Futures 2023, and Emmanuelle Bermès posted Toujours plus de futurs fantastiques ! (édition 2023).

FF2023 workshops

I ran a workshop and went to two others the day before the conference proper began. I've put photos from the workshop I ran with Thomas Padilla (originally proposed with Nora McGregor and Silvia Gutiérrez De la Torre too) on Co-Creating an AI Responsive Information Literacy Curriculum workshop on Flickr. You can check out our workshop prompts and links to the 'AI literacy' curricula devised by participants.

Fantastic Futures Day 1

Thomas Mboa opens with a thought-provoking keynote. Is AI in GLAMs a Pharmakon (a purification ritual in ancient Greece where criminals were expelled)? Phamakon can mean both medicine and poison.

And discusses AI as technocoloniality e.g.Libraries in the age of technocoloniality: Epistemic alienation in African scholarly communications

Mboa asks / challenges the GLAM community:

Can we ensure cultural integrity alone, from our ivory tower?
How can we involve data-providers communities without exploiting them?
Al feeds on data, which in turn conveys biases. How can we ensure the quality of data?

Cultural integrity is a measure of the wholeness or intactness of material, whether it respects and honours traditional ownership, traditions and knowledge

Mboa on AI for Fair Work – avoiding digital extractivism; the need for data justice e.g. https://www.gpai.ai/projects/future-of-work/AI-for-fair-work-report2022.pdf.

Thomas Mboa finishes with 'Some Key actions to ensure responsible use of Al in GLAM':

Develop Ethical Guidelines and Policies
Address Bias and Ensure Inclusivity
Enhance Privacy and Data Security
Balance Al with Human Expertise
Foster Digital Literacy and Skills Development
Promote Sustainable and Eco-friendly Practices
Encourage Collaboration and Community Engagement
Monitor and Evaluate Al Impact:
Intellectual Property and Copyright Considerations:
Preserve Authenticity and Integrity]

I shared lessons for libraries and AI from Living with Machines then there was a shared presentation on Responsible AI and governance – transparency/notice and clear explanations; risk management; ethics/discrimination, data protection and security.

Mike Trizna (and Rebecca Dikow) on the Smithsonian's AI values statement. Why We Need an Al Values Statement – everyone at the Smithsonian involved in data collection, creation, dissemination, and/or analysis is a stakeholder – Our goal is to aspirationally and proactively strive toward shared best practices across a distributed institution. All staff should feel like their expertise matters in decisions about technology.

Jill Reilly mentioned 'archivists in the loop' and 'citizen archivists in the loop' at NARA, and Inventory of NARA Artificial Intelligence (AI) Use Cases

From Bart Murphy (and Mary Sauer Games)'s talk it seems OCLC are really doing a good job operationalising AI to deduplicate catalogue entries at scale, maintaining quality and managing cost of cloud compute; also keeping ethics in mind.

Next, William Weaver on 'Navigating AI Advancements with VoucherVision and the Specimen Label Transcription Project' – using OCR to extract text from digitised herbarium sheets (vouchers) and machine learning to parse messy OCR. More solid work on quality control! Their biggest challenge is 'hallucinations' and also LLM imprecision in following their granular rules. More on this at The Future of Natural History Transcription: Navigating AI advancements with VoucherVision and the Specimen Label Transcription Project (SLTP).

Next, Abigail Potter and Laurie Allen, Introducing the LC Labs Artificial Intelligence Planning Framework. I love that LC Labs do the hard work of documenting and sharing the material they've produced to make experimentation, innovation and implementation of AI and new technologies possible in a very large library that's also a federal body.

Abby talked about their experiments with generating catalogue data from ebooks, co-led with their cataloguing department.

A panel discussed questions like: how do you think about "right sizing" your Al activities given your organizational capacity and constraints? How do you think about balancing R&D / experimentation with applying Al to production services / operations? How can we best work with the commercial sector? With researchers? What do you think the role of LAMs should be within the Al sector and society? How can we leverage each other as cultural heritage institutions?

I liked Stu Snydman's description of organising at Harvard to address AI with their values: embrace diverse perspectives, champion access, aim for the extraordinary, seek collaboration, lead with curiosity. And Ingrid Mason's description of NFSA's question about their 'social licence' (to experiment with AI) as an 'anchoring moment'. And there are so many reading groups!

Some of the final talks brought home how much more viable ChatGPT 4 has made some tasks, and included the first of two projects trying to work around the fact that people don't provide good metadata when depositing things in research archives.

Fantastic Futures Day 2

Day 2 begins with Mike Ridley on 'The Explainability Imperative' (for AI; XAI). We need trust and accountability because machine learning is consequential. It has an impact on our lives. Why isn't explainability the default?

His explainability priorities for LAM: HCXAI; Policy and regulation; Algorithmic literacy; Critical making.

Mike quotes 'Not everything that is important lies inside the black box of AI. Critical insights can lie outside it. Why? Because that's where the human are.' Ehsan and Riedl.

Mike – explanations should be actionable and contestable. They should enable reflection, not just acquiescence.

Algorithm literacy for LAMs – embed into information literacy programmes. Use algorithms with awareness. Create algorithms with integrity.

A man presenting in a fancy old building. On a slide above, a quote and photo of a woman: "Technology designers are the new policymakers; we didn't elect them but their decisions determine the rules we live by." Latanya Sweeney Harvard University Director of the Public Interest Tech Lab

Policy and regulation for GLAMs – engage with policy and regulatory activities; insist on explainability as a core principle; promote an explanatory systems approach; champion the needs of the non-expert, lay person.

Critical making for GLAMs – build our own tools and systems; operationalise the principles of HCXAI; explore and interrogate for bias, misinformation and deception; optimise for social justice and equity

Mike quotes: "Technology designers are the new policymakers; we didn't elect them but their decisions determine the rules we live by." Latanya Sweeney (Harvard University, Director of the Public Interest Tech Lab)

Next: shorter talks on 'AI and collections management'. Jon Dunn and Emily Lynema shared work on AMP, an audiovisual metadata platform, built on https://usegalaxy.org/ for workflow management. (Someone mentioned https://airflow.apache.org/ yesterday – I'd love to know more about GLAMs experiences with these workflow tools for machine learning / AI)

Nice 'AI explorer' from Harvard Art Museums https://ai.harvardartmuseums.org/search/elephant presented by Jeff Steward. It's a really nice way of seeing art through the eyes of different image tagging / labelling services like Imagga, Amazon, Clarifai, Microsoft.

(An example I found: https://ai.harvardartmuseums.org/object/228608. Showing predicted tags like this is a good step towards AI literacy, and might provide an interesting basis for AI explainability as discussed earlier.)

Screenshot of a webpage with a painting of apples and the tags that different machine learning services predicted for the painting

Scott Young and Jason Clark (Montana State University) shared work on Responsible AI at Montana State University. And a nice quote from Kate Zwaard, 'Through the slow and careful adoption of tech, the library can be a leader'. They're doing 'irresponsible AI scenarios' – a bit like a project pre-mortem with a specific scenario e.g. lack of resources.

Emmanuel A. Oduagwu from the Department of Library & Information Science, Federal Polytechnic, Nigeria, calls for realistic and sustainable collaborations between developing countries – library professionals need technical skills to integrate AI tools into library service delivery; they can't work in isolation from ICT. How can other nations help?

Generative AI at JSTOR FAQ https://www.jstor.org/generative-ai-faq from Bryan Ryder / Beth LaPensee's talk. Their guiding principles (approximately):

Empowering researchers: focus on enhancing researchers' capabilities, not replacing their work
User-centred approach – technology that adapts to individuals, not the other way around
Trusted and reliable – maintain JSTOR's reputation for accurate, trustworthy information; build safeguards
Collaborative development – openly and transparently with the research community; value feedback
Continuous learning – iterate based on evidence and user input; continually refine

Michael Flierl shared a bibliography for Explainable AI (XAI).

Finally, Leo Lo and Cynthia Hudson Vitale presented draft guiding principles from the US Association of Research Libraries (ARA). Points include the need to include human review; prioritise the safety and privacy of employees and users; prioritise inclusivity; democratise access to AI and be environmentally responsible.

Live-blog from MCG's Museums+Tech 2022

The Museums Computer Group's annual conference has been an annual highlight for some years now, and in 2022 I donned my mask and went to their in-person event. And only a few months later I'm posting this lightly edited version of my Mastodon posts from the day of the event in November 2022… Notes in brackets are generally from the original toots/posts.

This was the first event that I live-blogged on Mastodon rather than live-tweeting. I definitely missed the to-and-fro of conversation around a hashtag, as in mid-November Mastodon was a lot quieter than it is even a few weeks later. Anyway, on with the post!

I'm at the Museums Computer Group's #MuseTech2022 conference.

Here's the programme https://museumscomputergroup.org.uk/events/museumstech-2022-turning-it-off-and-on-again/

Huuuuuuge thanks to the volunteers who worked so hard on the event – and as Chair Dafydd James says, who've put extra work into making this a hybrid event https://museumscomputergroup.org.uk/about/committee/

Keynote Kati Price on the last two and a half years – a big group hug or primal scream might help!

She's looking at the consequences of the pandemic and lockdowns in terms of: collaboration, content, cash, churn

Widespread adoption of tools as people found new ways of collaborating from home

Content – the 'hosepipe of requests' for digital content is all too familiar. Lockdown reduced things to one unifying goal – to engage audiences online

(In hindsight, that moment of 'we must find / provide entertainment online' was odd – the world was already full of books, tv, podcasts, videos etc – did we want things we could do together that were a bit like things we'd do IRL?)

V&A moved to capture their Kimono exhibition to share online just before closing for lockdown. Got a Time Out 'Time In'. No fancy tech, just good storytelling

Took a data-informed approach to creating content e.g. ASMR videos. Shows the benefits of 'format thinking'. Recommends https://podcasts.apple.com/us/podcast/episode-016-matt-locke/id1498470334?i=1000500799064 #MuseTech2022

V&A found that people either wanted very short or long form content; some wanted informative, others light-hearted content

Cash – how do you keep creating great experiences when income drops? No visitors, no income.

Churn – 'the great resignation' – we've seen a brain drain in the #MuseTech / GLAM sector, especially as it's hard to attract people given salaries. Not only in tech – loss of expert collections, research staff who help inform online content

UK's heading into recession, so more cuts are probably coming. What should a digital team look like in this new era?

Also, we're all burnt out. (Holler!) Emotional reserves are at an all-time low.

(Thinking about the silos – I feel my work-social circles are dwindling as I don't run into people around the building now most people are WFH most of the time)

Back from the break at #MuseTech2022 for more #MuseTech goodness, starting with Seb Chan and Indigo Holcombe-James on ACMI's CEO Digital Mentoring Program – could you pair different kinds of organisations and increase the digital literacy of senior leaders?

Working with a mentor had tangible and intangible benefits (in addition to making time for learning and reflection). The next phase was shorter, with fewer people. (Context for non-Australians – Melbourne's lockdown was *very* long and very restrictive)

(I wonder what a 'minimum viable mentorship' model might be – does a long coffee with someone count? I've certainly had my brain picked that way by senior leaders interested in digital participation and strategy)

Lessons – cross-art form conversations work really well; everyone is facing similar challenges

(Side note – I'm liking that longer posts mean I'm not dashing off posts to keep up with the talks)

Next up #MuseTech2022 Stephanie Bertrand https://twitter.com/sbrtrandcurator on prestige and aesthetic judgement in the art world. Can you recruit the public's collective intelligence to discover artworks? But can you remove the influence of official 'art world' taste makers in judging artworks?

'Social feedback is a catch-22' – can have runaway inequality where popular content becomes more popular, and artificial manipulation that skews what's valued?

Now Somaya Langley https://twitter.com/criticalsenses on making digital preservation an everyday thing. (Shoutout to the awesome #DigiPres folk who do this hard work) – how can a whole organisation include digital preservation in its wider thinking about collections and corporate records? What about collecting born-digital content so prevalent in modern life?

(Side note – Australia seems to have a much stronger record management culture within GLAMs than in the UK, where IME you really have to search to find organisational expectations about archiving project records)

#MuseTech2022 Somaya's lessons learnt include: use the three-legged stool of digital preservation of technology, resources and organisation https://deepblue.lib.umich.edu/bitstream/handle/2027.42/60441/McGovern-Digital_Decade.html?sequence=4 – approach it holistically

Help colleagues learn by doing

Moving from Projects to Programmes to Business as Usual is hard

Help people be comfortable with there not being one right answer, and ok with 'it depends'

#MuseTech2022 Next up in Session 2: Collections; Craig Middleton, Caroline Wilson-Barnao, Lisa Enright – documenting intense bushfires in Aus summer 2019/20 and COVID. They used Facebook as a short-term response to the crisis; planned a physical exhibition but a website came to seem more appropriate as COVID went on. https://momentous.nma.gov.au has over 300 unique responses. FB helpful for seeing if a collecting idea works while it's timely, but other platforms better for sustained engagement. Also need to think about comfort levels about sharing content changing as time goes on.

Museums can be places to have difficult conversations, to help people make sense of crises. But museums also need to think beyond physical spaces and include digital from the start.

Also hard when museum people are going through the same crises (links back to Kati's keynote about what we lived through as a sector working for our audiences while living through the pando ourselves)

#MuseTech2022 David Weinczok 'using digital media to go local'

60% of National Museums Scotland's online audiences have never visited their museums. 'Telling the story of an object without the context of the landscape and community it came from' can help link online and in-person audiences and experiences

'Museum Screen Time' – experts react to pop culture depictions of their subject area eg Viking culture https://www.nms.ac.uk/explore-our-collections/films/museum-screen-time-viking-age/

Blog series 'Objects in Place' – found items in collections from a particular area, looked to tell stories with objects as 'connective threads', not the focus in themselves

'What can we do online to make connections with people and communities offline?'

(So many speakers are finishing with questions – I love this! Way to make the most of being in conversation with the musetech community here)

Next at #MuseTech2022, Amy Adams & Karen Clarke, National Museum of the Royal Navy – digital was always lower priority before COVID; managed to do lots of work on collections data during lockdowns.

They finally got a digital asset management (DAM) system, but then had to think about maintaining it; explaining why implementation takes time. Then there was an expectation that they could 'flip a switch' and put all the collections online. Finding ways to have positive conversations with folk who are still learning about the #MuseTech field.

Also doing work on 'addressing empires' – I like that framing for a very British institution.

Now Rebecca Odell, Niti Acharya, Hackney Museum on surviving a cyber attack. Lost access to collections management database (CMS) and images. Like their digital building had burnt down. Stakeholder and public expectations did not adjust accordingly! 14 months without a CMS.

Know where your backups are! Export DBs as CSV, store it externally. LOCKSS, hard drives

#MuseTech2022 Rebecca Odell, Niti Acharya, Hackney Museum continued – reconstructing your digital stuff from backups, exports, etc takes tiiiiiiime and lots of manual work. The sector needs guides, checklists, templates to help orgs prepare for cyber attacks.

(Lots of her advice also applies to your own personal digital media, of course. Back up your backups and put them lots of places. Leave a hard drive at work, swap one with a friend!)

New Q&A game – track the echo between remote speakers and the AV system in the back. Who's unmuted that should be muted? [One of the joys of a hybrid conference]

We'll be heading out to lunch soon, including the MCG annual general meeting

#MuseTech2022

(Missed a few talks post-lunch)

Adam Coulson (National Museums Scotland) on QR codes:
* weren't scanned in all exhibition/gallery contexts
* use them to add extra layers, not core content
* don't assume everyone will scan
* discourage FOMO (explain what's there)
* consider precious battery life

More at https://blog.nms.ac.uk/2022/07/19/qr-codes-in-museums-worth-the-effort/

Now Sian Shaw (Westminster Abbey) on no longer printing 12,000 sheets of paper a week (given out to visitors with that day's info). Made each order of service (dunno, church stuff, I am a heathen) at the same URL with templates to drop in commonly used content like hymns

It's a web page, not an app – more flexible, better affordances re your place on the page

Some loved the move to sustainability but others don't like having phones out in church.

Ultimately, be led by the problem you're trying to solve (and there's always a paper backup for no/dead phone folk)

Q&A discussion – take small steps, build on lessons learnt

#MuseTech2022 Onto the final panel, 'Funding digital – what two years worth of data tells us'

(It's funny when you have an insight into your own #MuseTech2022
life via a remark at a conference – the first ever museum team I worked in was 'Outreach' at Melbourne Museum, which combined my digital team with the learning team under the one director. I've always known that working in Outreach shaped my world view, but did sitting next to the learning team also shape it?)

And now Daf James is finishing with thanks for the committee members behind the MCG generally and the event in particular – big up @irny for keeping the tech going in difficult circumstances!

Daf James welcomes online and in-person attendees to the Museums Computer Group's Museums+Tech 2022 conference

Museums + AI, New York workshop notes

I’ve just spent Monday and Tuesday in New York for a workshop on ‘Museums + AI’. Funded by the AHRC and led by Oonagh Murphy and Elena Villaespesa, this was the second workshop in the year-long project.

Photo of workshop participants — Workshop participants

As there’s so much interest in artificial intelligence / machine learning / data science right now, I thought I’d revive the lost art of event blogging and share my notes. These notes are inevitably patchy, so keep an eye out for more formal reports from the team. I’ve used ‘museum’ throughout, as in the title of the event, but many of these issues are relevant to other collecting institutions (libraries, archives) and public venues. I’m writing this on the Amtrak to DC so I’ve been lazy about embedding links in text – sorry!

After a welcome from Pratt (check out their student blog https://museumsdigitalculture.prattsi.org/), Elena’s opening remarks introduced the two themes of the workshop: AI + visitor data and AI + Collections data. Questions about visitor data include whether museums have the necessary data governance and processes in place; whether current ethical codes and regulations are adequate for AI; and what skills staff might need to gain visitor insights with AI. Questions about collections data include how museums can minimise algorithmic biases when interpreting collections; whether the lack of diversity in both museum and AI staff would be reflected in the results; and the implications of museums engaging with big tech companies.

Achim Koh’s talk raised many questions I’ve had as we’ve thought about AI / machine learning in the library, including how staff traditionally invested with the authority to talk about collections (curators, cataloguers) would feel about machines taking on some of that work. I think we’ve broadly moved past that at the library if we can assume that we’d work within systems that can distinguish between ‘gold standard’ records created by trained staff and those created by software (with crowdsourced data somewhere inbetween, depending on the project).

John Stack and Jamie Unwin from the (UK) Science Museum shared some the challenges of using pre-built commercial models (AWS Rekognition and Comprehend) on museum collections – anything long and thin is marked as a 'weapon' – and demonstrated a nice tool for seeing 'what the machine saw' https://johnstack.github.io/what-the-machine-saw/. They don’t currently show machine-generated tags to users, but they’re used behind-the-scenes for discoverability. Do we need more transparency about how search results were generated – but will machine tags ever be completely safe to show people without vetting, even if confidence scores and software versions are included with the tags?

(If you’d like to see what all the tagging fuss is about, I have an older hands-on work sheet for trying text and images with machine classification software at https://www.openobjects.org.uk/2017/02/trying-computational-data-generation-and-entity-extraction/ )

Andrew Lih talked about image classification work with the Metropolitan Museum and Wikidata which picked up on the issue of questionable tags. Wikidata has a game-based workflow for tagging items, which in addition to tools for managing vandalism or miscreants allows them to trust the ‘crowd’ and make edits live immediately. Being able to sift incorrect from correct tags is vital – but this in turn raises questions of ‘round tripping’ – should a cultural institution ingest the corrections? (I noticed this issue coming up a few times because it’s something we’ve been thinking about as we work with a volunteer creating Wikidata that will later be editable by anyone.) Andrew said that the Met project put AI more firmly into the Wikimedia ecosystem, and that more is likely to come. He closed by demonstrating how the data created could put collections in the centre of networks of information http://w.wiki/6Bf Keep an eye out for the Wiki Art Depiction Explorer https://docs.google.com/presentation/d/1H87K5yjlNNivv44vHedk9xAWwyp9CF9-s0lojta5Us4/edit#slide=id.g34b27a5b18_0_435

Jeff Steward from Harvard Art Museums gave a thoughtful talk about how different image tagging and captioning tools (Google Vision, Imagga, Clarifai, Microsoft Cognitive Services) saw the collections, e.g. Imagga might talk about how fruit depicted in a painting tastes: sweet, juicy; how a bowl is used: breakfast, celebration. Microsoft tagger and caption tools have different views, don’t draw on each other.

Chris Alen Sula led a great session on ‘Ethical Considerations for AI’.

That evening, we went to an event at the Cooper Hewitt for more discussion of https://twitter.com/hashtag/MuseumsAI and the launch of their Interaction Lab https://www.cooperhewitt.org/interaction-lab/ Andrea Lipps and Harrison Pim’s talks reminded me of earlier discussion about holding cultural institutions to account for the decisions they make about AI, surveillance capitalism and more. Workshops like this (and the resulting frameworks) can provide the questions but senior staff must actually ask them, and pay attention to the answers. Karen Palmer’s talk got me thinking about what ‘democratising AI’ really means, and whether it’s possible to democratise something that relies on training data and access to computing power. Democratising knowledge about AI is a definite good, but should we also think about alternatives to AI that don’t involve classifications, and aren’t so closely linked to surveillance capitalism and ad tech?

The next day began with an inspiring talk from Effie Kapsalis on the Smithsonian Institution’s American Women’s History Initiative https://womenshistory.si.edu/ They’re thinking about machine learning and collections as data to develop ethical guidelines for AI and gender, analysing representations of women in multidisciplinary collections, enhancing data at scale and infusing the web with semantic data on historical women.

Shannon Darrough, MoMA, talked about a machine learning project with Google Arts and Culture to identify artworks in 30,000 installation photos, based on 70,000 collection images https://moma.org/calendar/exhibitions/history/identifying-art It was great at 2D works, not so much 3D, installation, moving image or performance art works. The project worked because they identified a clear problem that machine learning could solve. His talk led to discussion about sharing training models (i.e. once software is trained to specialise in particular subjects, others can re-use the ‘models’ that are created), and the alignment between tech companies’ goals (generally, shorter-term, self-contained) and museums’ (longer-term, feeding into core systems).

I have fewer notes from talks by Lawrence Swiader (American Battlefield Trust) with good advice on human-centred processes, Juhee Park (V&A) on frameworks for thinking about AI and museums, Matthew Cock (VocalEyes) on chat bots for venue accessibility information, and Carolyn Royston and Rachel Ginsberg (on the Cooper Hewitt’s Interaction Lab), but they added to the richness of the day. My talk was on ‘operationalising AI at a national library’, my slides are online https://www.slideshare.net/miaridge/operationalising-ai-at-a-national-library The final activity was on ‘managing AI’, a subject that’s become close to my heart.

Notes from Digital Humanities 2019 (DH2019 Utrecht)

My rough notes from the Digital Humanities 2019 conference in Utrecht. All the usual warnings about partial attention / tendency for distraction apply. My comments are usually in brackets.

I found the most useful reference for the conference programme to be https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&presentations=show but it doesn't show the titles or abstracts for papers within panels.

Some places me and my colleagues were during the conference: https://blogs.bl.uk/digital-scholarship/2019/07/british-library-digital-scholarship-at-digital-humanities-2019-.html http://livingwithmachines.ac.uk/living-with-machines-at-digital-humanities-2019/

DH2019 Keynote by Francis B. Nyamnjoh, 'African Inspiration for Understanding the Compositeness of Being Human through Digital Technology'

https://dh2019.adho.org/wp-content/uploads/2019/07/Nyamnjoh_Digital-Humanities-Keynote_2019.pdf

Notion of complexity, and incompleteness familiar to Africa. Africans frown on attempts to simplify

How do notions of incompleteness provide food for thought in digital humanities?

Nyamnjoh decries the sense of superiority inspired by zero sum games. 'Humans are incomplete, nature is incomplete. Religious bit. No one can escape incompleteness.' (Phew! This is something of a mantra when you work with collections at scale – working in cultural institutions comes with a daily sense that the work is so large it will continue after you're just a memory. Let's embrace rather than apologise for it)

References books by Amos Tutuola

Nyamnjoh on hidden persuaders, activators. Juju as a technology of self-extension. With juju, you can extend your presence; rise beyond ordinary ways of being. But it can also be spyware. (Timely, on the day that Zoom was found to allow access to your laptop camera – this has positives and negatives)

Nyamnjoh: DH as the compositeness of being; being incomplete is something to celebrate. Proposes a scholarship of conviviality that takes in practices from different academic disciplines to make itself better.

Nyamnjoh in response to Micki K's question about history as a zero-sum game in which people argue whether something did or didn't happen: create archives that can tell multiple stories, complexify the stories that exist

DH2019 Day 1, July 10

LP-03: Space Territory GeoHumanities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=455&presentations=show Locating Absence with Narrative Digital Maps

How to combine new media production with DH methodologies to create kit for recording and locating in the field.

Why georeference? Situate context, comparison old and new maps, feature extraction, or exploring map complexity.

Maps Re-imagined: Digital, Informational, and Perceptional Experimentations in Progress by Tyng-Ruey Chuang, Chih-Chuan Hsu, Huang-Sin Syu used OpenStreetMap with historical Taiwanese maps. Interesting base map options inc ukiyo style https://bcfuture.github.io/tileserver/Switch.html

Oceanic Exchanges: Transnational Textual Migration And Viral Culture

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=477&presentations=show Oceanic Exchanges studies the flow of information, searching for historical-literary connections between newspapers around the world; seeks to push the boundaries of research w newspapers

Challenges: imperfect comparability of corpora – data is provided in different ways by each data provider; no unifying ontology between archives (no generic identification of specific items); legal restrictions; TEI and other work hasn't been suitable for newspaper research
Limited ability to conduct research across repositories. Deep semantic multilingual text mining remains a challenge. Political (national) and practical organisation of archives currently determines questions that can be asked, privileges certain kinds of enquiry.
Oceanic Exchanges project includes over 100 million pages. Corpus exploration tool needed to support: exploring data (metadata and text); other things that went by too quickly.

The Past, Present and Future of Digital Scholarship with Newspaper Collections

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=483&presentations=show

I was on this panel so I tweeted a bit but have no notes myself.

My abstract: https://www.openobjects.org.uk/2019/07/the-past-present-and-future-of-digital-scholarship-with-newspaper-collections/
My slides: https://www.slideshare.net/miaridge/living-with-machines-at-the-past-present-and-future-of-digital-scholarship-with-newspaper-collections-154700888
See also: http://livingwithmachines.ac.uk/living-with-machines-at-digital-humanities-2019/
@RossiAtanassova Laurel Brake: A researcher's wish list for digitised newspaper journals ⁦⁩pic.twitter.com/rNmuuBOFb8
@giovanni1085 ⁦@printjournalism⁩ list of existing (and very much felt) problems/challenges for digital media history. But, there is hope and we persevere #DH2019 pic.twitter.com/LSilbMi9vg
@juliannenyhan Crucial points by @Ajprescott about necessity of developing critical frameworks for scholarship with digital newspapers that assist in helping us understand how & why digital newspaper collections take form they do & how e.g. power, bias & absence act on and through them #dh2019 pic.twitter.com/WSfjC2aq2t

Working with historical text (digitised newspapers, books, whatever) collections at scale has some interesting challenges and rewards. Inspired by all the newspaper sessions? Join an emerging community of practitioners, researchers and critical friends via this document from a 'DH2019 Lunch session – Researchers & Libraries working together on improving digitised newspapers' https://docs.google.com/document/d/1JJJOjasuos4yJULpquXt8pzpktwlYpOKrRBrCds8r2g/edit

Zotero group for Historical Periodicals https://www.zotero.org/groups/704613/historical_periodicals
Discussion list https://groups.google.com/forum/#!forum/digital-historical-periodica

Complexities, Explainability and Method

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=486&presentations=show I enjoyed listening to this panel which is so far removed from my everyday DH practice.

Other stuff

Tweet: If you ask a library professional about digitisating (new word alert!) a specific collection and they appear to go quiet, this is actually what they're doing – digitisation takes shedloads of time and paperwork https://twitter.com/CamDigLib/status/1148888628405395456

Posters

'Why engage with digital source criticism' seems particularly relevant to our 'fake news' era https://ranke2.uni.lu pic.twitter.com/JxV83RFTY6
Good 'DH for fun' poster https://correspsearch.net/quotesalute/ 'inspiring greetings for your correspondence'
I missed the special poster session so excited to see PDFs and links at 'Digital Humanities – the perspective of Africa'https://dhafrica.blog/outcomes/

@LibsDH ADHO Lib & DH SIG meetup

There was a lunchtime meeting for 'Libraries and Digital Humanities: an ADHO Special Interest Group', which was a lovely chance to talk libraries / GLAMs and DH. You can join the group via https://docs.google.com/forms/d/e/1FAIpQLSfswiaEnmS_mBTfL3Bc8fJsY5zxhY7xw0auYMCGY_2R0MT06w/viewform or the mailing list at http://lists.digitalhumanities.org/mailman/listinfo/libdh-sig

DH2019 Day 2, July 11

XR in DH: Extended Reality in the Digital Humanities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=523&presentations=show

Another panel where I enjoyed listening and learning about a field I haven't explored in depth. Tweet from the Q&A: 'Love the 'XR in DH: Extended Reality in the Digital Humanities' panel responses to a question about training students only for them to go off and get jobs in industry: good! Industry needs diversity, PhDs need to support multiple career paths beyond academia'

Data Science & Digital Humanities: new collaborations, new opportunities and new complexities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=532&presentations=show Beatrice Alex, Anne Alexander, David Beavan, Eirini Goudarouli, Leonardo Impett, Barbara McGillivray, Nora McGregor, Mia Ridge

My work with open cultural data has led to me asking 'how can GLAMs and data scientists collaborate to produce outcomes that are useful for both?'. Following this, I presented a short paper, more info at https://www.openobjects.org.uk/2019/07/in-search-of-the-sweet-spot-infrastructure-at-the-intersection-of-cultural-heritage-and-data-science/ https://www.slideshare.net/miaridge/in-search-of-the-sweet-spot-infrastructure-at-the-intersection-of-cultural-heritage-and-data-science.

As summarised in tweets:

https://twitter.com/semames1/status/1149250799232540672, 'data science can provide new routes into library collections; libraries can provide new challenging sources of information (scale, untidy data) for data scientists';
https://twitter.com/sp_meta/status/1149251010025656321 'library staff are often assessed by strict metrics of performance – items catalog, speed of delivery to reading room – that isn’t well-matched to messy, experimental collaborations with data scientists';
https://twitter.com/melissaterras/status/1149251480576303109 'Copyright issues are inescapable… they are the background noise to what we do';
https://twitter.com/sp_meta/status/1149251656720289792 'How can library infrastructure change to enable collaboration with data scientists, encouraging use of collections as data and prompting researchers to share their data and interpretations back?';
(me) 'I'm wondering about this dichotomy between 'new' or novel, and 'useful' or applied – is there actually a sweet spot where data scientists can work with DH / GLAMs or should we just apply data science methods and also offer collections for novel data science research? Thinking of it as a scale of different aspects of 'new to applied research' rather than a simple either/or'.

SP-19: Cultural Heritage, Art/ifacts and Institutions

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=462&presentations=show

“Un Manuscrit Naturellement ” Rescuing a library buried in digital sand

1979, agreement with Ministry of Culture and IRHT to digitise all manuscripts stored in French public libraries. (Began with microfilm, not digital). Safe, but not usable. Financial cost of preserving 40TB of data was prohibitive, but BnF started converting TIFFs to JP2 which made storage financially feasible. Huge investment by France in data preservation for digitised manuscripts.
Big data cleaning and deduplication process, got rid of 1 million files. Discovered errors in TIFF when converting to JP2. Found inconsistencies with metadata between databases and files. 3 years to do the prep work and clean the data!
‘A project which lasts for 40 years produces a lot of variabilities’. Needed a team, access to proper infrastructure; the person with memory of the project was key.

A Database of Islamic Scientific Manuscripts — Challenges of Past and Future

(Following on from the last paper, digital preservation takes continuous effort). Moving to RDF model based on CIDOC-CRM, standard triple store database, standard ResearchSpace/Metaphactory front end. Trying to separate the data from the software to make maintenance easier.

Analytical Edition Detection In Bibliographic Metadata; The Emerging Paradigm of Bibliographic Data Science

Tweet: Two solid papers on a database for Islamic Scientific Manuscripts and data science work with the ESTC (English Short Title Catalogue) plus reflections on the need for continuous investment in digital preservation. Back on familiar curatorial / #MuseTech ground!
Lahti – Reconciling / data harmonisation for early modern books is so complex that there are different researchers working on editions, authors, publishers, places

Syriac Persons, Events, and Relations: A Linked Open Factoid-based Prosopography

Prosopography and factoids. His project relies heavily on authority files that http://syriaca.org/ produces. Modelling factoids in TEI; usually it’s done in relational databases.
Prosopography used to be published as snippets of narrative text about people that enough information was available about
Factoid – a discrete piece of prosopographical information asserted in a primary source text and sourced to that text.
Person, event and relation factoids. Researcher attribution at the factoid level. Using TEI because (as markup around the text) it stays close to the primary source material; can link out to controlled vocabulary
Srophe app – an open source platform for cultural heritage data used to present their prosopographical data https://srophe.app/
Harold Short says how pleased he is to hear a project like that taking the approach they have; TEI wasn’t available as an option when they did the original work (seriously beautiful moment)
Why SNAP? ‘FOAF isn’t really good at describing relationships that have come about as a result of slave ownership’
More on factoid prosopography via Arianna Ciula https://factoid-dighum.kcl.ac.uk/

Day 3, July 12

Complexities in the Use, Analysis, and Representation of Historical Digital Periodicals

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=527&presentations=show

Torsten Roeder: Tracing debate about a particular work through German music magazines and daily newspapers. OCR and mass digitisation made it easier to compose representative text corpora about specific subjects. Authorship information isn’t available so don’t know their backgrounds etc, means a different form of analysis. ‘Horizontal reading’ as a metaphor for his approach. Topic modelling didn’t work for looking for music criticism.
Roeder's requirements: accessible digital copies of newspapers; reliable metadata; high quality OCR or transcriptions; article borders; some kind of segmentation; deep semantic annotation – ‘but who does what?’ What should collection holders / access providers do, and what should researchers do? (e.g. who should identify entities and concepts within texts? This question was picked up in other discussion in the session, on twitter and at an impromptu lunchtime meetup)
Zeg Segal. The Periodical as a Geographical Space. Relation between the two isn’t unidirectional. Imagined space constructed by the text and its layout. Periodicals construct an imaginary space that refers back to the real. Headlines, para text, regular text. Divisions between articles. His case study for exploring the issues: HaZefirah. (sample slide image https://twitter.com/mia_out/status/1149581497680052224)
Nanette Rißler-Pipka, Historical Periodicals Research, Opportunities and Limitations. The limitations she encounters as a researcher. Building a corpus of historical periodicals for a research question often means using sources from more than one provider of digitised texts. Different searches, rights, structure. (The need for multiple forms of interoperability, again)
Wants article / ad / genre classifications. For metadata wants, bibliographical data about the title (issue, date); extractable data (dates, names, tables of contents), provenance data (who digitised, when?). When you download individual articles, you lose the metadata which would be so useful for research. Open access is vital; interoperability is important; the ability to create individual collections across individual libraries is a wonderful dream
Estelle Bunout. Impresso providing exploration tools (integrate and decomplexify NLP tools in current historical research workflows). https://impresso-project.ch/app/#/
Working on: expanding a query – find neighbouring terms and frequent OCR errors. Overview of query: where and when is it? Whole corpus has been processed with topic modelling.
Complex queries: help me find the mention of places, countries, person in a particular thematic context. Can save to collection or export for further processing.
See the unsearchable: missing issues, failure to digitise issues, failure to OCRise, corrupt files
Transparency helps researchers discover novel opportunities and make informed decisions about sources.
Clifford Wulfman – how to support transcriptions, linked open data that allows exploration of notions of periodicity, notions of the periodical. My tweet: Clifford Wulfman acknowledging that libraries don't have the resources to support special 'snowflake' projects because they're working to meet the most common needs. IME this question/need doesn't go away so how best to tackle and support it?
Q&A comment: what if we just put all newspapers on Impresso? Discussion of standardisation, working jointly, collaborating internationally
Melodee Beals comments: libraries aren’t there just to support academic researchers, academics could look to supporting the work of creative industries, journalists and others to make it easier for libraries to support them.
Subject librarian from Leiden University points out that copyright limits their ability to share newspapers after 1880. (Innovating is hard when you can't even share the data)
Nanette Rißler says researchers don't need fancy interfaces, just access to the data (which probably contradicts the need for 'special snowflake' systems and explains why libraries can never ever make all users happy)

LP-34: Cultural Heritage, Art/ifacts and Institutions

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=516&presentations=show

(I was chairing so notes are sketchier)

Mark Hill, early modern (1500-1800 but 18thC in particular) definitions of ‘authorship’. How does authorship interact with structural aspects of publishing? Shift of authorship from gentlemanly to professional occupation.
Using the ESTC. Has about 1m actors, 400k documents with actors attached to them. Actors include authors, editors, publishers, printers, translators, dedicatees. Early modern print trade was ‘trade on a human scale’. People knew each other ‘hand-operated printing press required individual actors and relationships’.
As time goes on, printers work with fewer, publishers work with more people, authors work with about the same number of people.
They manually created a network of people associated with Bernard Mandeville and compared it with a network automatically generated from ESTC.
Looking at a work network for Edmond Hoyle’s Short Treatise on the Game of Whist. (Today I learned that Hoyle's Rules, determiner of victory in family card games and of 'according to Hoyle' fame, dates back to a book on whist in the 18thC)
(Really nice use of social network analysis to highlight changes in publisher and authorship networks.) Eigenvector very good at finding important actors. In the English Civil War, who you know does matter when it comes to publishing. By 18thC publishers really matter. See http://ceur-ws.org/Vol-2364/19_paper.pdf for more.

Richard Freedman, David Fiala, Andrew Janco et al

What is a musical quotation? Borrowing, allusion, parody, commonplace, contrafact, cover, plagiat, sampling, signifying.
Tweet: Freedman et al.'s slides for 'Citations: The Renaissance Imitation Mass (CRIM) and The Quotable Musical Text in a Digital Age' https://bit.ly/CRIM_Utrecht are a rich introduction to applications of #DigitalMusicology encoding and markup
I spend so much time in text worlds that it's really refreshing to hear from musicologists who play music to explain their work and place so much value on listening while also exploiting digital processing tools to the max

Digging Into Pattern Usage Within Jazz Improvisation (Pattern History Explorer, Pattern Search and Similarity Search) Frank Höger, Klaus Frieler, Martin Pfleiderer

'Dig that lick' jazz similarity search engine https://dig-that-lick.hfm-weimar.de/pattern_search/

Impromptu meetup to discuss issues raised around digitised newspapers research and infrastructure

See notes about DH2019 Lunch session – Researchers & Libraries working together on improving digitised newspapers. 20 or more people joined us for a discussion of the wonderful challenges and wish lists from speakers, thinking about how we can collaborate to improve the provision of digitised newspapers / periodicals for researchers.

https://twitter.com/saschel/status/1149640870628483072
Inspired by conversations about digitised newspapers at #DH2019? Think about the points / rants / lessons you’d share in manifestos by/for researchers and GLAMs

Theorising the Spatial Humanities panel

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=539&presentations=show

?? Space as a container for understanding, organising information. Chorography, the writing of the region.
Tweet: In the spatial humanities panel where a speaker mentions chorography, which along with prosopography is my favourite digital-history-enabled-but-also-old concept
Daniel Alves. Do history and literature researchers feel the need to incorporate spatial analysis in their work? A large number who do don’t use GIS. Most of them don’t believe in it (!). The rest are so tired that they prefer theorising (!!) His goal, ref last night keynote, is not to build models, tools, the next great algorithm; it’s to advance knowledge in his specific field.
Tweet: @DanielAlvesFCSH Is #SpatialDH revolutionary? Do history and literature researchers feel the need to incorporate spatial analysis in their work? A large number who do don’t use GIS. Most of them don’t believe in it(!). The rest are so tired that they prefer theorising(!!)
Tweet: @DanielAlvesFCSH close reading is still essential to take in the inner subjectivity of historical / literary sources with a partial and biases conception of space and place
Tien Danniau, Ghent Centre for Digital Humanities – deep maps. How is the concept working for them?
Tweet: Deep maps! A slide showing some of the findings from the 2012 NEH Advanced Institute on spatial narratives and deep mapping, which is where I met many awesome DH and spatial history people #DH2019pic.twitter.com/JiQepz7kH5
Katie McDonough, Spatial history between maps and texts: lessons from the 18thC. Refers to Richard White’s spatial history essay in her abstract. Rethinking geographic information extraction. Embedded entities, spatial relations, other stuff.
Tweet: @khetiwe24 references work discussed in https://www.tandfonline.com/doi/abs/10.1080/13658816.2019.1620235?journalCode=tgis20 … noting how the process of annotating texts requires close reading that changes your understanding of place in the text (echoing @DanielAlvesFCSH 's earlier point)
Tweet: Final #spatialDH talk 'towards spatial linguistics' #DH2019 https://twitter.com/mia_out/status/1149666605258829824
Tweet #DH2019 Preserving deep maps? I'd talk to folk in web archiving for a sense of which issues re recording complex, multi-format, dynamic items are tricky and which are more solveable

Closing keynote: Digital Humanities — Complexities of Sustainability, Johanna Drucker

(By this point my laptop and mental batteries were drained so I just listened and tweeted. I was also taking part in a conversation about the environmental sustainability of travel for conferences, issues with access to visas and funding, etc, that might be alleviated by better incorporating talks from remote presenters, or even having everyone present online.)

Finally, the DH2020 conference is calling for reviewers. Reviewing is an excellent way to give something back to the DH community while learning about the latest work as it appears in proposals, and perhaps more importantly, learning how to write a good proposal yourself. Find out more: http://dh2020.adho.org/cfps/reviewers/

Notes from 'AI, Society & the Media: How can we Flourish in the Age of AI'

Before we start: in the spirit of the mid-2000s, I thought I'd have a go at blogging about events again. I've realised I miss the way that blogging and reading other people's posts from events made me feel part of a distributed community of fellow travellers. Journal articles don't have the same effect (they're too long and jargony for leisure readers, assuming they're accessible outside universities at all), and tweets are great for connecting with people, but they're very ephemeral. Here goes…

On September 3 I was at BBC Broadcasting House for 'AI, Society & the Media: How can we Flourish in the Age of AI?' by BBC, LCFI and The Alan Turing Institute. Artificial intelligence is a hot topic so it was a sell-out event. My notes are very partial (in both senses of the word), and please do let me know if there are errors. The event hashtag will provide more coverage: https://twitter.com/hashtag/howcanweflourish.

The first session was 'AI – What you need to know!'. Matthew Postgate began by providing context for the BBC's interest in AI. 'We need a plurality of business models for AI – not just ad-funded' – yes! The need for different models for AI (and related subjects like machine learning) was a theme that recurred throughout the day (and at other events I was at this week).

Adrian Weller spoke on the limitations of AI. It's data hungry, compute intensive, poor at representing uncertainty, easily fooled by adversarial examples (and more that I missed). We need sensible measures of trustworthiness including robustness, fairness, protection of privacy, transparency.

Been Kim shared Google's AI principles: https://ai.google/principles She's focused on interpretability – goals are to ensure that our values are aligned and our knowledge is reflected. She emphasised the need to understand your data (another theme across the day and other events this week). You can an inherently interpretable machine model (so it can explain its reasoning) or can build an interpreter, enabling conversations between humans and machines. You can then uncover bias using the interpreter, asking what weight it gave to different aspects in making decisions.

Jonnie Penn (who won me with an early shout out to the work of Jon Agar) asked, from where does AI draw its authority? AI is feeding a monopoly of Google-Amazon-Facebook who control majority of internet traffic and advertising spend. Power lies in choosing what to optimise for, and choosing what not to do (a tragically poor paraphrase of his example of advertising to children, but you get the idea). We need 'bureaucratic biodiversity' – need lots of models of diverse systems to avoid calcification.

Kate Coughlan – only 10% of people feel they can influence AI. They looked at media narratives re AI on axes of time (ease vs obsolescence), power (domination vs uprising), desire (gratification vs alienation), life (immortality vs inhumanity). Their survey found that each aspect was equally disempowering. Passivity drives negative outcomes re feelings about change, tech – but if people have agency, then it's different. We need to empower citizens to have active role in shaping AI.

The next session was 'Fake News, Real Problems: How AI both builds and destroys trust in news'. Ryan Fox spoke on 'manufactured consensus' – we're hardwired to agree with our community so you can manipulate opinion by making it look like everyone else thinks a certain way. Manipulating consensus is currently legal, though against social network T&S. 'Viral false narratives can jeopardise brand trust and integrity in an instant'. Manufactured outrage campaigns etc. They're working on detecting inorganic behaviour through the noise – it's rapid, repetitive, sticky, emotional (missed some).

One of the panel questions was, would AI replace journalists? No, it's more like having lots of interns – you wouldn't have them write articles. AI is good for tasks you can explain to a smart 16 year old in the office for a day. The problematic ad-based model came up again – who is the arbiter of truth (e.g. fake news on Facebook). Who's paying for those services and what power does it give them?

This panel made me think about discussions about machine learning and AI at work. There are so many technical, contextual and ethical challenges for collecting institutions in AI, from capturing the output of an interactive voice experience with Alexa, to understanding and recording the difference between Russia Today as a broadcast news channel and as a manipulator of YouTube rankings.

Next was a panel on 'AI as a Creative Enabler'. Cassian Harrison spoke about 'Made By Machine', an experiment with AI and archive programming. They used scene detection, subtitle analysis, visual 'energy', machine learning on the BBC's Redux archive of programmes. Programmes were ranked by how BBC4 they were; split into sections then edited down to create mini BBC4 programmes.

Kanta Dihal and Stephen Cave asked why AI fascinates us in a thoughtful presentation. It's between dead and alive, uncanny (and lots more but clearly my post-lunch notetaking isn't the best).

Anna Ridler and Amy Cutler have created an AI-scripted nature documentary (trained on and re-purposing a range of tropes and footage from romance novels and nature documentaries) and gave a brilliant presentation about AI as a medium and as a process. Anna calls herself a dataset artist, rather than a machine learning artist. You need to get to know the dataset, look out for biases and mistakes, understand the humanness of decisions about what was included or excluded. Machines enact distorted versions of language.

Text from slide is transcribed above — Diane Coyle on 'Lessons for the era of AI'

I don't have notes from 'Next Gen AI: How can the next generation flourish in the age of AI?' but it was great to hear about hackathons where teenagers could try applying AI. The final session was 'The Conditions for Flourishing: How to increase citizen agency and social value'. Hannah Fry – once something is dressed up as an algorithm it gains some authority that's hard to question. Diane Coyle talked about 'general purpose technologies', which transform one industry then others. Printing, steam, electricity, internal combustion engine, digital computing, AI. Her 'lessons for the era of AI' were: all technology is social; all technologies are disruptive and have unpredictable consequences; all successful technologies enhance human freedoms', and accordingly she suggested we 'think in systems; plan for change; be optimistic'.

Konstantinos Karachalios called for a show of hands re who feels they have control over their data and what's done with it? Very few hands were raised. 'If we don't act now we'll lose our agency'.

I'm going to give the final word to Terah Lyons as the key takeaway from the day: 'technology is not destiny'.

I didn't hear a solution to the problems of 'fake news' that doesn't require work from all of us. If we don't want technology to be destiny, we all need pay attention to the applications of AI in our lives, and be prepared to demand better governance and accountability from private and government agents.

(A bonus 'question I didn't ask' for those who've read this far: how do BBC aims for ethical AI relate to the introduction compulsory registration to access tv and radio? If I turn on the radio in my kitchen, my listening habits aren't tracked; if I listen via the app they're linked to my personal ID).

Crowdsourcing workshop at DH2016 – session overview

A quick signal boost for the collaborative notes taken at the DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? (held in Kraków, Poland, on 12 July as part of the Digital Humanities 2016 conference, abstract below). We'd emphasised the need to document the unconference-style sessions (see FAQ) so that future projects could benefit from the collective experiences of participants. Since it can be impossible to find Google Docs or past tweets, I've copied the session overview below. The text is a summary of key takeaways or topics discussed in each session, created in a plenary session at the end of the workshop.

Participant introductions and interests – live notes
Ethics, Labour, sensitive material Key takeaway – questions for projects to ask at the start; don't impose your own ethics on a project, discussing them is start of designing the project.	Where to start Engaging volunteers, tips including online communities, being open to levels of contribution, being flexible, setting up standards, quality	Workflow, lifecycle, platforms What people were up to, the problems with hacking systems together, iiif.io, flexibility and workflows
Public expertise, education, what’s unique to humanities crowdsourcing The humanities are contestable! Responsibility to give the public back the results of the process in re-usable	Options, schemas and goals for text encoding Encoding systems will depend on your goals; full-text transcription always has some form of encoding, data models – who decides what it is, and when? Then how are people guided to use it?Trying to avoid short-term solutions
UX, flow, motivation Making tasks as small as possible; creating a sense of contribution; creating a space for volunteers to communicate; potential rewards, issues like badgefication and individual preferences. Supporting unexpected contributions; larger-scale tasks Project scale – thinking ahead to ending projects technically, and in terms of community – where can life continue after your project ends	Finding and engaging volunteers Using social media, reliance on personal networks, super-transcribers, problematic individuals who took more time than they gave to the project. Successful strategies are very-project dependent. Something about beer (production of Itinera Nova beer with label containing info on the project and link to website).	Ecosystems and automatic transcription Makes sense for some projects, but not all – value in having people engage with the text. Ecosystem – depending on goals, which parts work better? Also as publication – editions, corpora – credit, copyright, intellectual property
Plenary session, possible next steps – put information into a wiki. Based around project lifecycle, critical points? Publication in an online journal? Updateable, short-ish case studies. Could be categorised by different attributes. Flexible, allows for pace of change. Illustrate principles, various challenges. Short-term action: post introductions, project updates and new blog posts, research, etc to https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=CROWDSOURCING – a central place to send new conference papers, project blog posts, questions, meet-ups.

The workshop abstract:

Crowdsourcing – asking the public to help with inherently rewarding tasks that contribute to a shared, significant goal or research interest related to cultural heritage collections or knowledge – is reasonably well established in the humanities and cultural heritage sector. The success of projects such as Transcribe Bentham, Old Weather and the Smithsonian Transcription Center in processing content and engaging participants, and the subsequent development of crowdsourcing platforms that make launching a project easier, have increased interest in this area. While emerging best practices have been documented in a growing body of scholarship, including a recent report from the Crowd Consortium for Libraries and Archives symposium, this workshop looks to the next 5 – 10 years of crowdsourcing in the humanities, the sciences and in cultural heritage. The workshop will gather international experts and senior project staff to document the lessons to be learnt from projects to date and to discuss issues we expect to be important in the future.

Photo by Digital Humanities ‏@DH_Western

The workshop is organised by Mia Ridge (British Library), Meghan Ferriter (Smithsonian Transcription Centre), Christy Henshaw (Wellcome Library) and Ben Brumfield (FromThePage).

If you're new to crowdsourcing, here's a reading list created for another event.

The state of museum technology?

On Friday I was invited to Nesta's Digital Culture Panel event to respond to their 2015 Digital Culture survey on 'How arts and cultural organisations in England use technology' (produced with Arts Council England (ACE) and the Arts and Humanities Research Council (AHRC)). As Chair of the Museums Computer Group (MCG) (a practitioner-led group of over 1500 museum technology professionals), I've been chatting to other groups about the gap between the digital skills available and those needed in the museum sector, so it's a subject close to my heart. In previous years I'd noted that the results didn't seem to represent what I knew of museums and digital from events and working in the sector, so I was curious to see the results.

Some of their key findings for museums (PDF) are below, interspersed with my comments. I read this section before the event, and found I didn't really recognise the picture of museums it presented. 'Museums' mightn't be the most useful grouping for a survey like this – the material that MTM London's Ed Corn presented on the day broke the results down differently, and that made more sense. The c2,500 museums in the UK are too varied in their collections (from dinosaurs to net art), their audiences, and their local and organisational context (from tiny village museums open one afternoon a week, to historic houses, to university museums, to city museums with exhibitions that were built in the 70s, to white cube art galleries, to giants like the British Museum and Tate) to be squished together in one category. Museums tend to be quite siloed, so I'd love to know who fills out the survey, and whether they ask the whole organisation to give them data beforehand.

According to the survey, museums are significantly less likely to engage in:

email marketing (67 per cent vs. 83 per cent for the sector as a whole) – museums are missing out! Email marketing is relatively cheap, and it's easy to write newsletters. It's also easy to ask people to sign up when they're visiting online sites or physical venues, and they can unsubscribe anytime they want to. Social media figures can look seductively huge, but Facebook is a frenemy for organisations as you never know how many people will actually see a post.
publish content to their own website (55 per cent vs. 72 per cent) – I wasn't sure how to interpret this – does this mean museums don't have their own websites? Or that they can't update them? Or is 'content' a confusing term? At the event it was said that 10% of orgs have no email marketing, website or Facebook, so there are clearly some big gaps to fill still.
sell event tickets online (31 per cent vs. 45 per cent) – fair enough, how many museums sell tickets to anything that really need to be booked in advance?
post video or audio content (31 per cent vs. 43 per cent) – for most museums, this would require an investment to create as many don't already have filmable material or archived films to hand. Concerns about 'polish' might also be holding some museums back – they could try periscoping tours or sharing low-fi videos created by front of house staff or educators. Like questions about offering 'online interactive tours of real-world spaces' and 'artistic projects', this might reflect initial assumptions based on ACE's experience with the performing arts. A question about image sharing would make more sense for museums. Similarly, the kinds of storytelling that blog posts allow can sometimes work particularly well for history and science museums (who don't have gorgeous images of art that tell their own story).
make use of social media video advertising (18 per cent vs. 32 per cent) – again, video is a more natural format for performing arts than for museums
use crowdfunding (8 per cent vs. 19 per cent) – crowdfunding requires a significant investment of time and is often limited to specific projects rather than core business expenses, so it might be seen as too risky, but is this why museums are less likely to try it?
livestream performances (2 per cent vs. 12 per cent) – again, this is less likely to apply to museums than performing arts organisations

One of the key messages in Ed Corn's talk was that organisations are experimenting less, evaluating the impact of digital work less, and not using data in digital decision making. They're also scaling back on non-core work; some are focusing on consolidation – fixing the basics like websites (and mobile-friendly sites). Barriers include lack of funding, lack of in-house time, lack of senior digital managers, slow/limited IT systems, and lack of digital supplier. (Many of those barriers were also listed in a small-scale survey on 'issues facing museum technologists' I ran in 2010.)

When you consider the impact of the cuts year on year since 2010, and that 'one in five regional museums at least part closed in 2015', some of those continued barriers are less surprising. At one point everyone I know still in museums seemed to be doing at least one job on top of theirs, as people left and weren't replaced. The cuts might have affected some departments more deeply than others – have many museums lost learning teams? I suspect we've also lost two generations of museum technologists – the retiring generation who first set up mainframe computers in basements, and the first generation of web-ish developers who moved on to other industries as conditions in the sector got more grim/good pay became more important. Fellow panelist Ros Lawler also made the point that museums have to deal with legacy systems while also trying to look at the future, and that museum projects tend to slow when they could be more agile.

Like many in the audience, I really wanted to know who the 'digital leaders' – the 10% of organisations who thought digital was important, did more digital activities and reaped the most benefits from their investment – were, and what made them so successful. What can other organisations learn from them?

It seems that we still need to find ways to share lessons learnt, and to help everyone in the arts and cultural sectors learn how to make the most of digital technologies and social media. Training that meets the right need at the right time is really hard to organise and fund, and there are already lots of pockets of expertise within organisations – we need to get people talking to each other more! As I said at the event, most technology projects are really about people. Front of house staff, social media staff, collections staff – everyone can contribute something.

If you were there, have read the report or explored the data, I'd love to know what you think. And I'll close with a blatant plug: the MCG has two open calls for papers a year, so please keep an eye out for those calls and suggest talks or volunteer to help out!

My 'Welcome' notes for UKMW15 'Bridging Gaps, Making Connections'

I'm at the British Museum today for the Museums Computer Group's annual UK 'Museums on the Web' conference. UKMW15 has a packed line-up full of interesting presentations. As Chair of the MCG, I briefly introduced the event. My notes are below, in part to make sure that everyone who should be thanked is thanked! You can read a more polished version of this written with my Programme Committe Co-Chair Danny Birchall in a Guardian Culture Professionals article, 'How digital tech can bridge gaps between museums and audiences'.

UK Museums on the Web 2015: 'Bridging Gaps, Making Connections' #UKMW15

I'd like to start by thanking everyone who helped make today happen, and by asking the MCG Committee Members who are here today to stand up, so that you can chat to them, ideally even thank them, during the day. For those who don't know us, the Museums Computer Group is a practitioner-lead group who work to connect, support and inspire anyone working in museum technology. (There are lots of ways to get involved – we're electing new committee members at our AGM at lunchtime, and we will also be asking for people to host next year's event at their museum or help organise a regional event.)

I'd particularly like to thank Ina Pruegel and Jennifer Ross, who coordinated the event, the MCG Committee members who did lots of work on the event (Andrew, Dafydd, Danny, Ivan, Jess, Kath, Mia, Rebecca, Rosie), and the Programme Committee members who reviewed presentation proposals sent in. They were: co-chairs: Danny Birchall and Mia Ridge, with Chris Michaels (British Museum), Claire Bailey Ross (Durham University), Gill Greaves (Arts Council England), Jenny Kidd (Cardiff University), Jessica Suess (Oxford University Museums), John Stack (Science Museum Group), Kim Plowright (Mildly Diverting), Matthew Cock (Vocal Eyes), Rachel Coldicutt (Friday), Sara Wajid (National Maritime Museum), Sharna Jackson (Hopster), Suse Cairns (Baltimore Museum of Art), Zak Mensah (Bristol Museums, Galleries & Archives).

And of course I'd like to thank the speakers and session chairs, the British Museum, Matt Caines at the Guardian, and in advance I'd like to thank all the tweets, bloggers and photographers who'll help spread this event beyond the walls of this room.

Which brings me to the theme of the event, 'Bridging Gaps, Making Connections'. We've been running UK Museums on the Web since 2001; last year our theme was 'museums beyond the web' in recognition that barriers between 'web teams' and 'web projects' and the rest of the organisation were breaking down. But it's also apparent that the gap between tiny, small, and even medium-sized museums and the largest, best-funded museums meant that digital expertise and knowledge had not reached the entire sector. The government's funding cuts and burnout mean that old museum hands have left, and some who replace them need time to translate their experience in other sectors into museums. Our critics and audiences are confused about what to expect, and museums are simultaneously criticised for investing too much in technologies that disrupt the traditional gallery and for being 'dull and dusty'. Work is duplicated across museums, libraries, archives and other cultural organisations; academic and commercial projects sometimes seem to ignore the wealth of experience in the sector.

So today is about bridging those gaps, and about making new connections. (I've made my own steps in bridging gaps by joining the British Library as a Digital Curator.) We have a fabulous line-up representing the wealth and diversity of experience in museum technologies.

So take lots of notes to share with your colleagues. Use your time here to find people to collaborate with. Tweet widely. Ask MCG Committee members to introduce you to other people here. Let people with questions know they can post them on the MCG discussion list and connect with thousands of people working with museums and technology. Now, more than ever, an event like this isn't about technology; it's about connecting and inspiring people.

How did 'play' shape the design and experience of creating Serendip-o-matic?

Here are my notes from the Digital Humanities 2014 paper on 'Play as Process and Product' I did with Brian Croxall, Scott Kleinman and Amy Papaelias based on the work of the 2013 One Week One Tool team.

Scott has blogged his notes about the first part of our talk, Brian's notes are posted as '“If hippos be the Dude of Love…”: Serendip-o-matic at Digital Humanities 2014' and you'll see Amy's work adding serendip-o-magic design to our slides throughout our three posts.

I'm Mia, I was dev/design team lead on Serendipomatic, and I'll be talking about how play shaped both what you see on the front end and the process of making it.

How did play shape the process?

The playful interface was a purposeful act of user advocacy – we pushed against the academic habit of telling, not showing, which you see in some form here. We wanted to entice people to try Serendipomatic as soon as they saw it, so the page text, graphic design, 1 – 2 – 3 step instructions you see at the top of the front page were all designed to illustrate the ethos of the product while showing you how to get started.

How can a project based around boring things like APIs ~~and panic~~ be playful? Technical decision-making is usually a long, painful process in which we juggle many complex criteria. But here we had to practice 'rapid trust' in people, in languages/frameworks, in APIs, and this turned out to be a very freeing experience compared to everyday work.

First, two definitions as background for our work…

Just in case anyone here isn't familiar with APIs, APIs are a set of computational functions that machines use to talk to each other. Like the bank in Monopoly, they usually have quite specific functions, like taking requests and giving out information (or taking or giving money) in response to those requests. We used APIs from major cultural heritage repositories – we gave them specific questions like 'what objects do you have related to these keywords?' and they gave us back lists of related objects.

The term 'UX' is another piece of jargon. It stands for 'user experience design', which is the combination of graphical, interface and interaction design aimed at making products both easy and enjoyable to use. Here you see the beginnings of the graphic design being applied (by team member Amy) to the underlying UX related to the 1-2-3 step explanation for Serendipomatic.

Feed.

The 'feed' part of Serendipomatic parsed text given in the front page form into simple text 'tokens' and looked for recognisable entities like people, places or dates. There's nothing inherently playful in this except that we called the system that took in and transformed the text the 'magic moustache box', for reasons lost to time (and hysteria).

Whirl.

These terms were then mixed into database-style queries that we sent to different APIs. We focused on primary sources from museums, libraries, archives available through big cultural aggregators. Europeana and the Digital Public Library of America have similar APIs so we could get a long way quite quickly. We added Flickr Commons into the list because it has high-quality, interesting images and brought in more international content. [It also turns out this made it more useful for my own favourite use for Serendipomatic, finding slide or blog post images.] The results are then whirled up so there's a good mix of sources and types of results. This is the heart of the magic moustache.

Marvel.

User-focused design was key to making something complicated feel playful. Amy's designs and the Outreach team work was a huge part of it, but UX also encompasses micro-copy (all the tiny bits of text on the page), interactions (what happened when you did anything on the site), plus loading screens, error messages, user documentation.

We knew lots of people would be looking at whatever we made because of OWOT publicity; you don't get a second shot at this so it had to make sense at a glance to cut through social media noise. (This also meant testing it for mobiles and finding time to do accessibility testing – we wanted every single one of our users to have a chance to be playful.)

Without all this work on the graphic design – the look and feel that reflected the ethos of the product – the underlying playfulness would have been invisible. This user focus also meant removing internal references and in-jokes that could confuse people, so there are no references to the 'magic moustache machine'. Instead, 'Serendhippo' emerged as a character who guided the user through the site.

But how does a magic moustache make a process playful?

The moustache was a visible signifier of play. It appeared in the first technical architecture diagram – a refusal to take our situation too seriously was embedded at the heart of the project. This sketch also shows the value of having a shared physical or visual reference – outlining the core technical structure gave people a shared sense of how different aspects of their work would contribute to the whole. After all, if there aren't any structure or rules, it isn't a game.

This playfulness meant that writing code (in a new language, under pressure) could then be about making the machine more magic, not about ticking off functions on a specification document. The framing of the week as a challenge and as a learning experience allowed a lack of knowledge or the need to learn new skills to be a challenge, rather than a barrier. My role was to provide just enough structure to let the development team concentrate on the task at hand.

In a way, I performed the role of old-fashioned games master, defining the technical constraints and boundaries much as someone would police the rules of a game. Previous experience with cultural heritage APIs meant I was able to make decisions quickly rather than letting indecision or doubt become a barrier to progress. Just as games often reduce complex situations to smaller, simpler versions, reducing the complexity of problems created a game-like environment.

UX matters

Ultimately, a focus on the end user experience drove all the decisions about the backend functionality, the graphic design and micro-copy and how the site responded to the user.

It's easy to forget that every pixel, line of code or text is there either through positive decisions or decisions not consciously taken. User experience design processes usually involve lots of conversation, questions, analysis, more questions, but at OWOT we didn't have that time, so the trust we placed in each other to make good decisions and in the playful vision for Serendipomatic created space for us to focus on creating a good user experience. The whole team worked hard to make sure every aspect of the design helps people on the site understand our vision so they can get with exploring and enjoying Serendipomatic.

Some possible real-life lessons I didn't include in the paper

One Week One Tool was an artificial environment, but here are some thoughts on lessons that could be applied to other projects:

Conversations trump specifications and showing trumps telling; use any means you can to make sure you're all talking about the same thing. Find ways to create a shared vision for your project, whether on mood boards, technical diagrams, user stories, imaginary product boxes.
Find ways to remind yourself of the real users your product will delight and let empathy for them guide your decisions. It doesn't matter how much you love your content or project, you're only doing right by it if other people encounter it in ways that make sense to them so they can love it too (there's a lot of UXy work on 'on-boarding' out there to help with this). User-centred design means understanding where users are coming from, not designing based on popular opinion.you can use tools like customer journey maps to understand the whole cycle of people finding their way to and using your site (I guess I did this and various other UXy methods without articulating them at the time).
Document decisions and take screenshots as you go so that you've got a history of your project – some of this can be done by archiving task lists and user stories.
Having someone who really understands the types of audiences, tools and materials you're working with helps – if you can't get that on your team, find others to ask for feedback – they may be able to save you lots of time and pain.
Design and UX resources really do make a difference, and it's even better if those skills are available throughout the agile development process.

What are the hidden costs when you attend an event?

I think quite hard about how to make Museums Computer Groups events as inclusive as possible, from the diversity of the speakers on stage, to setting dates and times as early as possible to allow cheaper pre-booked travel, to keeping event costs down and more, but there's always more to learn.

I've been thinking about the 'shadow' or hidden costs accrued when people attend events. For me, it's the cost of getting to London (up to £50 if it's at short notice) and the time it takes (up to 3 hours each way if I'm unlucky). For others, accessibility requirements add to the cost of events, whether that's sign language translators, taxis to accessible train stations, or someone else's time as an aide. For parents or people with other caring responsibilities, childcare costs may add to the expense of attending an event. This in turn affects our ability to put together a broad range of speakers for an event. So –

Hello parents in the UK! I'm thinking about hidden costs for speakers turning up to an event. How much does a day's childcare cost you?
— Mia (@mia_out) June 15, 2014

I'm asking parents in the UK for a rough estimate of childcare costs for a day. You can share yours by tweeting @mia_out or share anonymously via this form if 140 characters won't allow you to mention things like your location, number and age of kids: What are the hidden costs when you attend an event?* The second question on the form is more general, so if your costs have nothing to do with parenting, go for it! I'll share the answers so that other event organisers have a sense of the costs too.

Here are some responses to get you started – with thanks to those who've already shared their costs:

@otfrom @mia_out @JeniT £12 / hr for 2. A full day is usually 100 plus. Quite difficult to justify often esp w/ travel.
— Yodit Stanton (@yoditstanton) June 15, 2014

about £40 a day in the West Midlands.
— Andrew Fray (@tenpn) June 15, 2014

@mia_out childcare is £3/hr for a 5 year old. Was 330/mth as a child for 5 hrs a day.
— Mick Brennan (@lightzenton) June 15, 2014

We're a volunteer Committee rather than professional events organisers, and there's a humbling amount to learn from people out there. What hidden costs have I missed? Are there factors apart from cost that we should consider? We've got a Call for Papers for November 7's UKMW14: Museums Beyond the Web open at the moment (until June 30, 2014) – is there any language on that CfP or our Guidance for Speakers we should look at?

Update – more responses below.

@mia_out assuming you can actually find a reliable (qualified?) babysitter for a whole day or two, then min. wage c£6 per hour at very least
— Internet Archaeology (@IntarchEditor) June 15, 2014

@mia_out nursery (under 5s) costs might be between £30-50 per day but of course they won't do weekends
— Internet Archaeology (@IntarchEditor) June 15, 2014

@mia_out day would cost us £90 or so, but not always poss; last conf my wife presented at I took leave and came along to look after babies.
— Jakob Whitfield (@thrustvector) June 15, 2014

@mia_out @otfrom Nurseries in East Dulwich vary, but since they're all oversubscribed 1500/month isn't outrageous.
— JulianBirch (@JulianBirch) June 15, 2014

@mia_out That's 8-6 including meals. Costs typically go down after age 2.But plenty of people can't find childcare at all and give up work.
— JulianBirch (@JulianBirch) June 15, 2014

@mia_out About £50 a day. We’re lucky in that our nursery will usually have space if we need an extra day.
— suzicatherine (@suzicatherine) June 15, 2014

The daily childcare costs (£40 pc,pd) are usually factored into a working day @mia_out The early drop off and/or late pick up can be tricky!
— Kathryn Eccles (@KathrynEccles) June 15, 2014

I'd been thinking of single-day events and the impact on speaker availability, but I was reminded of the impact of childcare and other responsibilities for people wanting to attend residential programmes or longer events (or day events that require an overnight stay to fit the travel in). For example:

@mia_out can't bring pets to conf., and dog walkers are not cheap #hiddencost
— Scott (@moltude) June 16, 2014

The ability to attend residential events for career or research fellowships is obviously going to have an impact on the types of people we see in leadership positions in later years, so thinking about things like childcare (which might be as simple as providing space for someone who already helps look after the family) now would make a positive difference later. On the positive side, many fellowships provide honorariums, which could help cover the hidden costs many of you have shared with me.

* I'm experimenting with typeform but already I'm concerned that their forms don't seem accessible – how are they for you?