Toddlers to teenagers: AI and libraries in 2023

A copy of my April 2023 position paper for the Collections as Data: State of the field and future directions summit held at the Internet Archive in Vancouver in April 2023. The full set of statements is available on Zenodo at Position Statements -> Collections as Data: State of the field and future directions. It'll be interesting to see how this post ages. I have a new favourite metaphor since I wrote this – the 'brilliant, hard-working — and occasionally hungover — [medical] intern'.

A light brown historical building with columns and steps. The building is small but grand. A modern skyscraper looms in the background.
The Internet Archive building in Vancouver

My favourite analogy for AI / machine learning-based tools[1] is that they’re like working with a child. They can spin a great story, but you wouldn’t bet your job on it being accurate. They can do tasks like sorting and labelling images, but as they absorb models of the world from the adults around them you’d want to check that they haven’t mistakenly learnt things like ‘nurses are women and doctors are men’.

Libraries and other GLAMs have been working with machine learning-based tools for a number of years, cumulatively gathering evidence for what works, what doesn’t, and what it might mean for our work. AI can scale up tasks like transcription, translation, classification, entity recognition and summarisation quickly – but it shouldn’t be used without supervision if the answer to the question ‘does it matter if the output is true?’ is ‘yes’.[2] Training a model and checking the results of an external model both require resources and expertise that may be scarce in GLAMs.

But the thing about toddlers is that they’re cute and fun to play with. By the start of 2023, ‘generative AI’ tools like the text-to-image tool DALL·E 2 and large language models (LLMs) like ChatGPT captured the public imagination. You’ve probably heard examples of people using LLMs as everything from an oracle (‘give me arguments for and against remodelling our kitchen’) to a tutor (‘explain this concept to me’) to a creative spark for getting started with writing code or a piece of text. If you don’t have an AI strategy already, you’re going to need one soon.

The other thing about toddlers is that they grow up fast. GLAMs have an opportunity to help influence the types of teenagers then adults they become – but we need to be proactive if we want AI that produces trustworthy results and doesn’t create further biases. Improving AI literacy within the GLAM sector is an important part of being able to make good choices about the technologies we give our money and attention to. (The same is also true for our societies as a whole, of course).

Since the 2017 summit, I’ve found myself thinking about ‘collections as data’ in two ways.[3] One is the digitised collections records (from metadata through to full page or object scans) that we share with researchers interested in studying particular topics, formats or methods; the other is the data that GLAMs themselves could generate about their collections to make them more discoverable and better connected to other collections. The development of specialist methods within computer vision and natural language processing has promise for both sorts of ‘collections as data’,[4] but we still have much to learn about the logistical, legal, cultural and training challenges in aligning the needs of researchers and GLAMs.

The buzz around AI and the hunger for more material to feed into models has introduced a third – collections as training data. Libraries hold vast repositories of historical and contemporary collections that reflect both the best thinking and the worst biases of the society that produced them. What is their role in responsibly and ethically stewarding those collections into training data (or not)?

As we learn more about the different ‘modes of interaction’ with AI-based tools, from the ‘text-grounded’, ‘knowledge-seeking’ and ‘creative’,[5] and collect examples of researchers and institutions using tools like large language models to create structured data from text,[6] we’re better able to understand and advocate for the role that AI might play in library work. Through collaborations within the Living with Machines project, I’ve seen how we could combine crowdsourcing and machine learning to clear copyright for orphan works at scale; improve metadata and full text searches with word vectors that help people match keywords to concepts rather than literal strings; disambiguate historical place names and turn symbols on maps into computational information.

Our challenge now is to work together with the Silicon Valley companies that shape so much of what AI ‘knows’ about the world, with the communities and individuals that created the collections we care for, and with the wider GLAM sector to ensure that we get the best AI tools possible.

[1] I’m going to use ‘AI’ as a shorthand for ‘AI and machine learning’ throughout, as machine learning models are the most practical applications of AI-type technologies at present. I’m excluding ‘artificial general intelligence’ for now.

[2] Tiulkanov, “Is It Safe to Use ChatGPT for Your Task?”

[3] Much of this thinking is informed by the Living with Machines project, a mere twinkle in the eye during the first summit. Launched in late 2018, the project aims to devise new methods, tools and software in data science and artificial intelligence that can be applied to historical resources. A key goal for the Library was to understand and develop some solutions for the practical, intellectual, logistical and copyright challenges in collaborative research with digitised collections at scale. As the project draws to an end five and a half years later, I’ve been reflecting on lessons learnt from our work with AI, and on the dramatic improvements in machine learning tools and methods since the project began.

[4] See for example Living with Machines work with data science and digital humanities methods documented at https://livingwithmachines.ac.uk/achievements

[5] Goldberg, “Reinforcement Learning for Language Models.” April 2023. https://gist.github.com/yoavg/6bff0fecd65950898eba1bb321cfbd81.

[6] For example, tools like Annif https://annif.org, and the work of librarian/developers like Matt Miller and genealogists.

Little, “AI Genealogy Use Cases, How-to Guides.” 2023. https://aigenealogyinsights.com/ai-genealogy-use-cases-how-to-guides/

Miller, “Using GPT on Library Collections.” March 30, 2023. https://thisismattmiller.com/post/using-gpt-on-library-collections/.

Is 'clicks to curiosity triggered' a good metric for GLAM collections online?

The National Archives UK have a 'new way to explore the nation’s archives' and it's lovely: https://beta.nationalarchives.gov.uk/explore-the-collection/

It features highlights from their collections and 'stories behind our records'. The front page offers options to explore by topic (based on the types of records that TNA holds) and time period. It also has direct links to individual stories, with carefully selected images and preview text. Three clicks in and I was marvelling at a 1904 photo from a cotton mill, and connecting it to other knowledge.

When you click into a story about an individual record, there's a 'Why this record matters' heading, which reminds me of the Australian model for a simple explanation of the 'significance' of a collection item. Things get a bit more traditional 'catalogue record online' when you click through to the 'record details' but overall it's an effective path that helps you understand what's in their collections.

The simplicity of getting to an interesting items has made me wonder about a new UX metric for collections online – 'time to curiosity inspired', or more accurately 'clicks to curiosity triggered'. 'Clicks to specific item' is probably a more common metric for catalogue-based searches, but this is a different type of invitation to explore a collection via loosely themed stories.

'About' post https://blog.nationalarchives.gov.uk/new-way-to-explore-the-nations-archives/ and others under the 'Project ETNA' tag.

Screenshot of the Explore website, with colourful pictures next to headings like 'explore by topic', 'explore by time period' and 'registered design for an expanding travelling basket'

Live-blog from MCG's Museums+Tech 2022

The Museums Computer Group's annual conference has been an annual highlight for some years now, and in 2022 I donned my mask and went to their in-person event. And only a few months later I'm posting this lightly edited version of my Mastodon posts from the day of the event in November 2022… Notes in brackets are generally from the original toots/posts.

This was the first event that I live-blogged on Mastodon rather than live-tweeting. I definitely missed the to-and-fro of conversation around a hashtag, as in mid-November Mastodon was a lot quieter than it is even a few weeks later. Anyway, on with the post!

I'm at the Museums Computer Group's #MuseTech2022 conference.

Here's the programme https://museumscomputergroup.org.uk/events/museumstech-2022-turning-it-off-and-on-again/

Huuuuuuge thanks to the volunteers who worked so hard on the event – and as Chair Dafydd James says, who've put extra work into making this a hybrid event https://museumscomputergroup.org.uk/about/committee/

Keynote Kati Price on the last two and a half years – a big group hug or primal scream might help!

She's looking at the consequences of the pandemic and lockdowns in terms of: collaboration, content, cash, churn

Widespread adoption of tools as people found new ways of collaborating from home

Content – the 'hosepipe of requests' for digital content is all too familiar. Lockdown reduced things to one unifying goal – to engage audiences online

(In hindsight, that moment of 'we must find / provide entertainment online' was odd – the world was already full of books, tv, podcasts, videos etc – did we want things we could do together that were a bit like things we'd do IRL?)

V&A moved to capture their Kimono exhibition to share online just before closing for lockdown. Got a Time Out 'Time In'. No fancy tech, just good storytelling

Took a data-informed approach to creating content e.g. ASMR videos. Shows the benefits of 'format thinking'. Recommends https://podcasts.apple.com/us/podcast/episode-016-matt-locke/id1498470334?i=1000500799064 #MuseTech2022

V&A found that people either wanted very short or long form content; some wanted informative, others light-hearted content

Cash – how do you keep creating great experiences when income drops? No visitors, no income.

Churn – 'the great resignation' – we've seen a brain drain in the #MuseTech / GLAM sector, especially as it's hard to attract people given salaries. Not only in tech – loss of expert collections, research staff who help inform online content

UK's heading into recession, so more cuts are probably coming. What should a digital team look like in this new era?

Also, we're all burnt out. (Holler!) Emotional reserves are at an all-time low.

(Thinking about the silos – I feel my work-social circles are dwindling as I don't run into people around the building now most people are WFH most of the time)

Back from the break at #MuseTech2022 for more #MuseTech goodness, starting with Seb Chan and Indigo Holcombe-James on ACMI's CEO Digital Mentoring Program – could you pair different kinds of organisations and increase the digital literacy of senior leaders?

Working with a mentor had tangible and intangible benefits (in addition to making time for learning and reflection). The next phase was shorter, with fewer people. (Context for non-Australians – Melbourne's lockdown was *very* long and very restrictive)

(I wonder what a 'minimum viable mentorship' model might be – does a long coffee with someone count? I've certainly had my brain picked that way by senior leaders interested in digital participation and strategy)

Lessons – cross-art form conversations work really well; everyone is facing similar challenges

(Side note – I'm liking that longer posts mean I'm not dashing off posts to keep up with the talks)

Next up #MuseTech2022 Stephanie Bertrand https://twitter.com/sbrtrandcurator on prestige and aesthetic judgement in the art world. Can you recruit the public's collective intelligence to discover artworks? But can you remove the influence of official 'art world' taste makers in judging artworks?

'Social feedback is a catch-22' – can have runaway inequality where popular content becomes more popular, and artificial manipulation that skews what's valued?

Now Somaya Langley https://twitter.com/criticalsenses on making digital preservation an everyday thing. (Shoutout to the awesome #DigiPres folk who do this hard work) – how can a whole organisation include digital preservation in its wider thinking about collections and corporate records? What about collecting born-digital content so prevalent in modern life?

(Side note – Australia seems to have a much stronger record management culture within GLAMs than in the UK, where IME you really have to search to find organisational expectations about archiving project records)

#MuseTech2022 Somaya's lessons learnt include: use the three-legged stool of digital preservation of technology, resources and organisation https://deepblue.lib.umich.edu/bitstream/handle/2027.42/60441/McGovern-Digital_Decade.html?sequence=4 – approach it holistically

Help colleagues learn by doing

Moving from Projects to Programmes to Business as Usual is hard

Help people be comfortable with there not being one right answer, and ok with 'it depends'

#MuseTech2022 Next up in Session 2: Collections; Craig Middleton, Caroline Wilson-Barnao, Lisa Enright – documenting intense bushfires in Aus summer 2019/20 and COVID. They used Facebook as a short-term response to the crisis; planned a physical exhibition but a website came to seem more appropriate as COVID went on. https://momentous.nma.gov.au has over 300 unique responses. FB helpful for seeing if a collecting idea works while it's timely, but other platforms better for sustained engagement. Also need to think about comfort levels about sharing content changing as time goes on.

Museums can be places to have difficult conversations, to help people make sense of crises. But museums also need to think beyond physical spaces and include digital from the start.

Also hard when museum people are going through the same crises (links back to Kati's keynote about what we lived through as a sector working for our audiences while living through the pando ourselves)

#MuseTech2022 David Weinczok 'using digital media to go local'

60% of National Museums Scotland's online audiences have never visited their museums. 'Telling the story of an object without the context of the landscape and community it came from' can help link online and in-person audiences and experiences

'Museum Screen Time' – experts react to pop culture depictions of their subject area eg Viking culture https://www.nms.ac.uk/explore-our-collections/films/museum-screen-time-viking-age/

Blog series 'Objects in Place' – found items in collections from a particular area, looked to tell stories with objects as 'connective threads', not the focus in themselves

'What can we do online to make connections with people and communities offline?'

(So many speakers are finishing with questions – I love this! Way to make the most of being in conversation with the musetech community here)

Next at #MuseTech2022, Amy Adams & Karen Clarke, National Museum of the Royal Navy – digital was always lower priority before COVID; managed to do lots of work on collections data during lockdowns.

They finally got a digital asset management (DAM) system, but then had to think about maintaining it; explaining why implementation takes time. Then there was an expectation that they could 'flip a switch' and put all the collections online. Finding ways to have positive conversations with folk who are still learning about the #MuseTech field.

Also doing work on 'addressing empires' – I like that framing for a very British institution.

Now Rebecca Odell, Niti Acharya, Hackney Museum on surviving a cyber attack. Lost access to collections management database (CMS) and images. Like their digital building had burnt down. Stakeholder and public expectations did not adjust accordingly! 14 months without a CMS.

Know where your backups are! Export DBs as CSV, store it externally. LOCKSS, hard drives

#MuseTech2022 Rebecca Odell, Niti Acharya, Hackney Museum continued – reconstructing your digital stuff from backups, exports, etc takes tiiiiiiime and lots of manual work. The sector needs guides, checklists, templates to help orgs prepare for cyber attacks.

(Lots of her advice also applies to your own personal digital media, of course. Back up your backups and put them lots of places. Leave a hard drive at work, swap one with a friend!)

New Q&A game – track the echo between remote speakers and the AV system in the back. Who's unmuted that should be muted? [One of the joys of a hybrid conference]

We'll be heading out to lunch soon, including the MCG annual general meeting

#MuseTech2022

(Missed a few talks post-lunch)

Adam Coulson (National Museums Scotland) on QR codes:
* weren't scanned in all exhibition/gallery contexts
* use them to add extra layers, not core content
* don't assume everyone will scan
* discourage FOMO (explain what's there)
* consider precious battery life

More at https://blog.nms.ac.uk/2022/07/19/qr-codes-in-museums-worth-the-effort/

Now Sian Shaw (Westminster Abbey) on no longer printing 12,000 sheets of paper a week (given out to visitors with that day's info). Made each order of service (dunno, church stuff, I am a heathen) at the same URL with templates to drop in commonly used content like hymns

It's a web page, not an app – more flexible, better affordances re your place on the page

Some loved the move to sustainability but others don't like having phones out in church.

Ultimately, be led by the problem you're trying to solve (and there's always a paper backup for no/dead phone folk)

Q&A discussion – take small steps, build on lessons learnt

#MuseTech2022 Onto the final panel, 'Funding digital – what two years worth of data tells us'

(It's funny when you have an insight into your own #MuseTech2022
life via a remark at a conference – the first ever museum team I worked in was 'Outreach' at Melbourne Museum, which combined my digital team with the learning team under the one director. I've always known that working in Outreach shaped my world view, but did sitting next to the learning team also shape it?)

And now Daf James is finishing with thanks for the committee members behind the MCG generally and the event in particular – big up @irny for keeping the tech going in difficult circumstances!

Daf James welcomes online and in-person attendees to the Museums Computer Group's Museums+Tech 2022 conference

National approaches to crowdsourcing / citizen science?

This is a 'work in progress' post that I hope to add to as I gather information about national portals for crowdsourcing / citizen science / citizen history and other forms of voluntary digital / online participation.

While portals like SciStarter, Crowds4U and platforms like Zooniverse, FromThePage, HistoryPin etc are a great way to search across projects for something that matches your interests, I'm interested in the growth of national portals or indexes to projects (they might also be called 'project finders'). It's not so much the sites themselves that interest me as the underlying networks of regional communities of practice, national or regional infrastructure and other signs of national support that they might variously reflect or help create. If you're interested in specific projects outside the UK-US/English-language bubble, check out Crowdsourcing the world's heritage. I've also shared a 2015 list of 'participatory digital heritage sites' that includes many crowdsourcing sites.

If you know of a national portal or umbrella organisation for crowdsourcing, please drop me a line! Last updated: Feb 7, 2023.

Austria

Jan Smeddinck emailed to share the LBG Open Innovation in Science Center https://ois.lbg.ac.at/

Brazil

Lesandro Ponciano nominated 'Civis, which is the Brazilian Citizen Science platform. The link is https://civis.ibict.br/ Civis was built by using the same software developed by Ibercivis in Spain for the eu-citizen.science platform. Civis was launched in 2022 – the event (in Portuguese) is recorded on YouTube at
https://www.youtube.com/live/_nPqmcq0gos '

Canada

The Canadian Citizen Science portal

France

This post was inspired by the apparently coordinated approach in France. The Archives nationales participatives site has 'Projets collaboratifs de transcriptions, annotations et indexations' – that is, participatory national archives with collaborative transcription, annotation and indexing projects.

They also have Le réseau Particip-Arc, a 'network of actors committed to participatory science in the fields of culture', supported by the Ministry of Culture and coordinated by the National Museum of Natural History.

European Union

EU-citizen.science is a 'platform for sharing citizen science projects, resources, tools, training and much more'.

Germany / German-language projects

The German / German-language citizen science portal

Netherlands

Alastair Dunning pointed to the Citizen Science network, run by @CitSciLab (Margaret Gold).

Norway

Agata Bochynska said, 'Norway has recently formed a national network for citizen science that’s coordinated by Research Council of Norway' – Nasjonalt nettverk for folkeforskning (folkeforskning translates as 'folk research' according to Google).

Scotland

The Scottish Citizen Science portal

Slovenia

https://citizenscience.si/ lists current and completed citizen science projects in Slovenia, infrastructure available to support projects, and events and other activities. Hat tip Mitja V. Iskrić on mastodon.

Sweden

David Haskiya reports: 'medborgarforskning.se/ Provides an intro to citizen science, a catalogue of Swedish projects, etc. Seems to be part of an EU-network of such sites. Summary in English here https://medborgarforskning.se/eng/'

A Swedish national hub for everyone interested in citizen science (medborgarforskning). The project was funded by Vinnova – Sweden’s innovation agency, the University of Gothenburg, the Swedish University of Agricultural Sciences, Umeå University.

United Kingdom

gov.uk lists some volunteering portals but they don't make it easy to find online-only opportunities.

United Nations

https://app.unv.org/ lists online and on-site (i.e. in-person) opportunities around the world, although some of them might stretch the definition of 'voluntary roles'.

Wales

Rita Singer reports: 'In Wales, we have the People's Collection, which functions as a citizen archive of Wales' history and heritage.' https://www.peoplescollection.wales/

Crowdsourcing as connection: a constant star over a sea of change / Établir des connexions: un invariant des projets de crowdsourcing

As I'm speaking today at an event that's mostly in French, I'm sharing my slides outline so it can be viewed at leisure, or copy-and-pasted into a translation tool like Google Translate.

Colloque de clôture du projet Testaments de Poilus, Les Archives nationales de France, 25 Novembre 2022

Crowdsourcing as connection: a constant star over a sea of change, Mia Ridge, British Library

GLAM values as a guiding star

(Or, how will AI change crowdsourcing?) My argument is that technology is changing rapidly around us, but our skills in connecting people and collections are as relevant as ever:

  • Crowdsourcing connects people and collections
  • AI is changing GLAM work
  • But the values we express through crowdsourcing can light the way forward

(GLAM – galleries, libraries, archives and museums)

A sea of change

AI-based tools can now do many crowdsourced tasks:

  • Transcribe audio; typed and handwritten text
  • Classify / label images and text – objects, concepts, 'emotions'

AI-based tools can also generate new images, text

  • Deep fakes, emerging formats – collecting and preservation challenges

AI is still work-in-progress

Automatic transcription, translation failure from this morning: 'the encephalogram is no longer the mother of weeks'

  • Results have many biases; cannot be used alone
  • White, Western, 21st century view
  • Carbon footprint
  • Expertise and resources required
  • Not easily integrated with GLAM workflows

Why bother with crowdsourcing if AI will soon be 'good enough'?

The elephant in the room; been on my mind for a couple of years now

The rise of AI means we have to think about the role of crowdsourcing in cultural heritage. Why bother if software can do it all?

Crowdsourcing brings collections to life

  • Close, engaged attention to 'obscure' collection items
  • Opportunities for lifelong learning; historical and scientific literacy
  • Gathers diverse perspectives, knowledge

Crowdsourcing as connection

Crowdsourcing in GLAMs is valuable in part because it creates connections around people and collections

  • Between volunteers and staff
  • Between people and collections
  • Between collections

Examples from the British Library

In the Spotlight: designing for productivity and engagement

Living with Machines: designing crowdsourcing projects in collaboration with data scientists that attempt to both engage the public with our research and generate research datasets. Participant comments and questions inspired new tasks, shaped our work.

How do we follow the star?

Bringing 'crowdsourcing as connection' into work with AI

Valuing 'crowdsourcing as connection'

  • Efficiency isn't everything. Participation is part of our mission
  • Help technologists and researchers understand the value in connecting people with collections
  • Develop mutual understanding of different types of data – editions, enhancement, transcription, annotation
  • Perfection isn't everything – help GLAM staff define 'data quality' in different contexts
  • Where is imperfect, AI data at scale more useful than perfect but limited data?
  • 'réinjectée' – when, where, and how?
  • How does crowdsourcing, AI change work for staff?
  • How do we integrate data from different sources (AI, crowdsourcing, cataloguers), at different scales, into coherent systems?
  • How do interfaces show data provenance, confidence?

Transforming access, discovery, use

  • A single digitised item can be infinitely linked to places, people, concepts – how does this change 'discovery'?
  • What other user needs can we meet through a combination of AI, better data systems and public participation?

Merci de votre attention!

Pour en savoir plus: https://bl.uk/digital https://livingwithmachines.ac.uk

Essayez notre activité de crowdsourcing: http://bit.ly/LivingWithMachines

Nous attendons vos questions: digitalresearch@bl.uk

Screenshot of images generated by AI, showing variations on dark blue or green seas and shining stars
Versions of image generation for the text 'a bright star over the sea'
Presenting at Les Archives nationales de France, Paris, from home

Introducing… The Collective Wisdom Handbook

I'm delighted to share my latest publication, a collaboration with 15 co-authors written in March and April 2021. It's the major output of my Collective Wisdom project, an AHRC-funded project I lead with Meghan Ferriter and Sam Blickhan.

Until August 9, 2021, you can provide feedback or comment on The Collective Wisdom Handbook: perspectives on crowdsourcing in cultural heritage:

We have published this first version of our collaborative text to provide early access to our work, and to invite comment and discussion from anyone interested in crowdsourcing, citizen science, citizen history, digital / online volunteer projects, programmes, tools or platforms with cultural heritage collections.

I wrote two posts to provide further context:

Our book is now open for 'community review'. What does that mean for you?

Announcing an 'early access' version of our Collective Wisdom Handbook

I'm curious to see how much of a difference this period of open comment makes. The comments so far have been quite specific and useful, but I'd like to know where we *really* got it right, and where we could include other examples. You need a pubpub account to comment but after that it's pretty straightforward – select text, and add a comment, or comment on an entire chapter.

Having some distance from the original writing period has been useful for me – not least, the realisation that the title should have been 'perspectives on crowdsourcing in cultural heritage and digital humanities'.

Stuck at home? View cultural heritage collections online

With people self-isolating to slow the spread of the COVID-19 pandemic, parents and educators (as well as people looking for an art or history fix) may be looking to replace in-person trips to galleries, libraries, archives and museums* with online access to images of artefacts and information about them. GLAMs have spent decades getting some of the collections digitised and online so that you can view items and information from home.

* Collectively known as 'GLAMs' because it's a mouthful to say each time

Search a bunch of GLAM portals at once

I've made a quick 'custom search engine' so you can search most of the sites above with one Google search box. Search a range of portals that collect digitised objects, texts and media from galleries, libraries, archives and museums internationally:

The direct link is https://cse.google.com/cse?cx=006190492493219194770:xw0b7dfwb6b (it's just a search box, without any context, but it means you can do a search without loading this whole post)

Collections, deep zoom and virtual tour portals

Various platforms have large collections of objects from different institutions, in formats ranging from 'virtual exhibitions' or 'tours' to 'deep zooms' to catalogue-style pages about objects. I've focused on sites that include collections from multiple institutions, but this also means some of them are huge and you'll have to explore a bit to find relevant content. Try:

Other links

Various articles have collected institution-specific links to different forms of virtual tours. Try:

Things are moving fast, so let me know about other sets of links to collections, stories and tours online that'll help people staying home get their fix of history and culture and I'll update this post. Comment below, email me or @mia_out on twitter.

Screenshot from https://www.europeana.eu/portal/en
Europeana is just one of many online portals to images, stories, deep zooms and virtual tours / exhibitions from galleries, libraries, archives and museums internationally

Festival of Maintenance talk: Apps, microsites and collections online: innovation and maintenance in digital cultural heritage

I came to Liverpool for the 'Festival of Maintenance', a celebration of maintainers. I'm blogging my talk notes so that I'm not just preaching to the converted in the room. As they say:

'Maintenance and repair are just as important as innovation, but sometimes these ideas seem left behind. Amidst the rapid pace of innovation, have we missed opportunities to design things so that they can be fixed?'.

Liverpool 2019: Maintenance in Complex and Changing Times

Apps, microsites and collections online: innovation and maintenance in digital cultural heritage

My talk was about different narratives about 'digital' in cultural heritage organisations and how they can make maintenance harder or easier to support and resource. If last year's innovation is this year's maintenance task, how do we innovate to meet changing needs while making good decisions about what to maintain? At one museum job I calculated that c.85% of my time was spent on legacy systems, leaving less than a day a week for new work, so it's a subject close to my heart.

I began with an introduction to 'What does a cultural heritage technologist do?'. I might be a digital curator now but my roots lie in creating and maintaining systems for managing and sharing collections information and interpretative knowledge. This includes making digitised items available as individual items or computationally-ready datasets. There was also a gratuitous reference to Abba to illustrate the GLAM (galleries, libraries, archives and museums) acronym.

What do galleries, libraries, archives and museums have to maintain?

Exhibition apps and audio guides. Research software. Microsites by departments including marketing, education, fundraising. Catalogues. More catalogues. Secret spreadsheets. Digital asset management systems. Collections online pulled from the catalogue. Collections online from a random database. Student projects. Glueware. Ticketing. Ecommerce. APIs. Content on social media sites, other 3rd party sites and aggregators. CMS. CRM. DRM. VR, AR, MR.

Stories considered harmful

These stories mean GLAMs aren't making the best decisions about maintaining digital resources:

  • It's fine for social media content to be ephemeral
  • 'Digital' is just marketing, no-one expects it to be kept
  • We have limited resources, and if we spend them all maintaining things then how will we build the new cool things the Director wants?
  • We're a museum / gallery / library / archive, not a software development company, what do you mean we have to maintain things?
  • What do you mean, software decays over time? People don't necessarily know that digital products are embedded in a network of software dependencies. User expectations about performance and design also change over time.
  • 'Digital' is just like an exhibition; once it's launched you're done. You work really hard in the lead-up to the opening, but after the opening night you're free to move onto the next thing
  • That person left, it doesn't matter anymore. But people outside won't know that – you can't just let things drop.

Why do these stories matter?

If you don't make conscious choices about what to maintain, you're leaving it to fate.

Today's ephemera is tomorrow's history. Organisations need to be able to tell their own history. They also need to collect digital ephemera so that we can tell the history of wider society. (Social media companies aren't archives for your photos, events and stories.)

Better stories for the future

  • You can't save everything: make the hard choices. Make conscious decisions about what to maintain and how you'll close the things you can't maintain. Assess the likely lifetime of a digital product before you start work and build it into the roadmap.
  • Plan for a graceful exit – for all stakeholders. What lessons need to be documented and shared? Do you need to let any collaborators, funders, users or fans know? Can you make it web archive ready? How can you export and document the data? How can you document the interfaces and contextual reasons for algorithmic logic?
  • Refresh little and often, where possible. It's a pain, but it means projects stay in institutional memory
  • Build on standards, work with communities. Every collection is a special butterfly, but if you work on shared software and standards, someone else might help you maintain it. IIIF is a great example of this.

Also:

  • Check whether your websites are archiveready.com (and nominate UK websites for the UK Web Archive)
  • Look to expert advice on digital preservation
  • Support GLAMs with the legislative, rights and technical challenges of collecting digital ephemera. It's hard to collect social media, websites, podcasts, games, emerging formats, but if we don't, how will we tell the story of 'now' in the future?

And it's been on my mind a lot lately, but I didn't include it: consider the carbon footprint of cloud computing and machine learning, because we also need to maintain the planet.

In closing, I'd slightly adapt the Festival's line: 'design things so that they can be fixed or shut down when their job is done'. I'm sure I've missed some better stories that cultural institutions could tell themselves – let me know what you think!

Two of the organisers introducing the Festival of Maintenance event

Museums + AI, New York workshop notes

I’ve just spent Monday and Tuesday in New York for a workshop on ‘Museums + AI’. Funded by the AHRC and led by Oonagh Murphy and Elena Villaespesa, this was the second workshop in the year-long project.

Photo of workshop participants
Workshop participants

As there’s so much interest in artificial intelligence / machine learning / data science right now, I thought I’d revive the lost art of event blogging and share my notes. These notes are inevitably patchy, so keep an eye out for more formal reports from the team. I’ve used ‘museum’ throughout, as in the title of the event, but many of these issues are relevant to other collecting institutions (libraries, archives) and public venues. I’m writing this on the Amtrak to DC so I’ve been lazy about embedding links in text – sorry!

After a welcome from Pratt (check out their student blog https://museumsdigitalculture.prattsi.org/), Elena’s opening remarks introduced the two themes of the workshop: AI + visitor data and AI + Collections data. Questions about visitor data include whether museums have the necessary data governance and processes in place; whether current ethical codes and regulations are adequate for AI; and what skills staff might need to gain visitor insights with AI. Questions about collections data include how museums can minimise algorithmic biases when interpreting collections; whether the lack of diversity in both museum and AI staff would be reflected in the results; and the implications of museums engaging with big tech companies.

Achim Koh’s talk raised many questions I’ve had as we’ve thought about AI / machine learning in the library, including how staff traditionally invested with the authority to talk about collections (curators, cataloguers) would feel about machines taking on some of that work. I think we’ve broadly moved past that at the library if we can assume that we’d work within systems that can distinguish between ‘gold standard’ records created by trained staff and those created by software (with crowdsourced data somewhere inbetween, depending on the project).

John Stack and Jamie Unwin from the (UK) Science Museum shared some the challenges of using pre-built commercial models (AWS Rekognition and Comprehend) on museum collections – anything long and thin is marked as a 'weapon' – and demonstrated a nice tool for seeing 'what the machine saw' https://johnstack.github.io/what-the-machine-saw/. They don’t currently show machine-generated tags to users, but they’re used behind-the-scenes for discoverability. Do we need more transparency about how search results were generated – but will machine tags ever be completely safe to show people without vetting, even if confidence scores and software versions are included with the tags?

(If you’d like to see what all the tagging fuss is about, I have an older hands-on work sheet for trying text and images with machine classification software at https://www.openobjects.org.uk/2017/02/trying-computational-data-generation-and-entity-extraction/ )

Andrew Lih talked about image classification work with the Metropolitan Museum and Wikidata which picked up on the issue of questionable tags. Wikidata has a game-based workflow for tagging items, which in addition to tools for managing vandalism or miscreants allows them to trust the ‘crowd’ and make edits live immediately. Being able to sift incorrect from correct tags is vital – but this in turn raises questions of ‘round tripping’ – should a cultural institution ingest the corrections? (I noticed this issue coming up a few times because it’s something we’ve been thinking about as we work with a volunteer creating Wikidata that will later be editable by anyone.) Andrew said that the Met project put AI more firmly into the Wikimedia ecosystem, and that more is likely to come. He closed by demonstrating how the data created could put collections in the centre of networks of information http://w.wiki/6Bf Keep an eye out for the Wiki Art Depiction Explorer https://docs.google.com/presentation/d/1H87K5yjlNNivv44vHedk9xAWwyp9CF9-s0lojta5Us4/edit#slide=id.g34b27a5b18_0_435

Jeff Steward from Harvard Art Museums gave a thoughtful talk about how different image tagging and captioning tools (Google Vision, Imagga, Clarifai, Microsoft Cognitive Services) saw the collections, e.g. Imagga might talk about how fruit depicted in a painting tastes: sweet, juicy; how a bowl is used: breakfast, celebration. Microsoft tagger and caption tools have different views, don’t draw on each other.

Chris Alen Sula led a great session on ‘Ethical Considerations for AI’.

That evening, we went to an event at the Cooper Hewitt for more discussion of https://twitter.com/hashtag/MuseumsAI and the launch of their Interaction Lab https://www.cooperhewitt.org/interaction-lab/ Andrea Lipps and Harrison Pim’s talks reminded me of earlier discussion about holding cultural institutions to account for the decisions they make about AI, surveillance capitalism and more. Workshops like this (and the resulting frameworks) can provide the questions but senior staff must actually ask them, and pay attention to the answers. Karen Palmer’s talk got me thinking about what ‘democratising AI’ really means, and whether it’s possible to democratise something that relies on training data and access to computing power. Democratising knowledge about AI is a definite good, but should we also think about alternatives to AI that don’t involve classifications, and aren’t so closely linked to surveillance capitalism and ad tech?

The next day began with an inspiring talk from Effie Kapsalis on the Smithsonian Institution’s American Women’s History Initiative https://womenshistory.si.edu/ They’re thinking about machine learning and collections as data to develop ethical guidelines for AI and gender, analysing representations of women in multidisciplinary collections, enhancing data at scale and infusing the web with semantic data on historical women.

Shannon Darrough, MoMA, talked about a machine learning project with Google Arts and Culture to identify artworks in 30,000 installation photos, based on 70,000 collection images https://moma.org/calendar/exhibitions/history/identifying-art It was great at 2D works, not so much 3D, installation, moving image or performance art works. The project worked because they identified a clear problem that machine learning could solve. His talk led to discussion about sharing training models (i.e. once software is trained to specialise in particular subjects, others can re-use the ‘models’ that are created), and the alignment between tech companies’ goals (generally, shorter-term, self-contained) and museums’ (longer-term, feeding into core systems).

I have fewer notes from talks by Lawrence Swiader (American Battlefield Trust) with good advice on human-centred processes, Juhee Park (V&A) on frameworks for thinking about AI and museums, Matthew Cock (VocalEyes) on chat bots for venue accessibility information, and Carolyn Royston and Rachel Ginsberg (on the Cooper Hewitt’s Interaction Lab), but they added to the richness of the day. My talk was on ‘operationalising AI at a national library’, my slides are online https://www.slideshare.net/miaridge/operationalising-ai-at-a-national-library The final activity was on ‘managing AI’, a subject that’s become close to my heart.

Notes from Digital Humanities 2019 (DH2019 Utrecht)

My rough notes from the Digital Humanities 2019 conference in Utrecht. All the usual warnings about partial attention / tendency for distraction apply. My comments are usually in brackets.

I found the most useful reference for the conference programme to be https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&presentations=show but it doesn't show the titles or abstracts for papers within panels.

Some places me and my colleagues were during the conference: https://blogs.bl.uk/digital-scholarship/2019/07/british-library-digital-scholarship-at-digital-humanities-2019-.html http://livingwithmachines.ac.uk/living-with-machines-at-digital-humanities-2019/

DH2019 Keynote by Francis B. Nyamnjoh, 'African Inspiration for Understanding the Compositeness of Being Human through Digital Technology'

https://dh2019.adho.org/wp-content/uploads/2019/07/Nyamnjoh_Digital-Humanities-Keynote_2019.pdf

  • Notion of complexity, and incompleteness familiar to Africa. Africans frown on attempts to simplify
  • How do notions of incompleteness provide food for thought in digital humanities?
  • Nyamnjoh decries the sense of superiority inspired by zero sum games. 'Humans are incomplete, nature is incomplete. Religious bit. No one can escape incompleteness.' (Phew! This is something of a mantra when you work with collections at scale – working in cultural institutions comes with a daily sense that the work is so large it will continue after you're just a memory. Let's embrace rather than apologise for it)
  • References books by Amos Tutuola
  • Nyamnjoh on hidden persuaders, activators. Juju as a technology of self-extension. With juju, you can extend your presence; rise beyond ordinary ways of being. But it can also be spyware. (Timely, on the day that Zoom was found to allow access to your laptop camera – this has positives and negatives)
  • Nyamnjoh: DH as the compositeness of being; being incomplete is something to celebrate. Proposes a scholarship of conviviality that takes in practices from different academic disciplines to make itself better.
  • Nyamnjoh in response to Micki K's question about history as a zero-sum game in which people argue whether something did or didn't happen: create archives that can tell multiple stories, complexify the stories that exist

DH2019 Day 1, July 10

LP-03: Space Territory GeoHumanities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=455&presentations=show Locating Absence with Narrative Digital Maps

How to combine new media production with DH methodologies to create kit for recording and locating in the field.

Why georeference? Situate context, comparison old and new maps, feature extraction, or exploring map complexity.

Maps Re-imagined: Digital, Informational, and Perceptional Experimentations in Progress by Tyng-Ruey Chuang, Chih-Chuan Hsu, Huang-Sin Syu used OpenStreetMap with historical Taiwanese maps. Interesting base map options inc ukiyo style https://bcfuture.github.io/tileserver/Switch.html

Oceanic Exchanges: Transnational Textual Migration And Viral Culture

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=477&presentations=show Oceanic Exchanges studies the flow of information, searching for historical-literary connections between newspapers around the world; seeks to push the boundaries of research w newspapers

  • Challenges: imperfect comparability of corpora – data is provided in different ways by each data provider; no unifying ontology between archives (no generic identification of specific items); legal restrictions; TEI and other work hasn't been suitable for newspaper research
  • Limited ability to conduct research across repositories. Deep semantic multilingual text mining remains a challenge. Political (national) and practical organisation of archives currently determines questions that can be asked, privileges certain kinds of enquiry.
  • Oceanic Exchanges project includes over 100 million pages. Corpus exploration tool needed to support: exploring data (metadata and text); other things that went by too quickly.

The Past, Present and Future of Digital Scholarship with Newspaper Collections

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=483&presentations=show

I was on this panel so I tweeted a bit but have no notes myself.

Working with historical text (digitised newspapers, books, whatever) collections at scale has some interesting challenges and rewards. Inspired by all the newspaper sessions? Join an emerging community of practitioners, researchers and critical friends via this document from a 'DH2019 Lunch session – Researchers & Libraries working together on improving digitised newspapers' https://docs.google.com/document/d/1JJJOjasuos4yJULpquXt8pzpktwlYpOKrRBrCds8r2g/edit

Complexities, Explainability and Method

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=486&presentations=show I enjoyed listening to this panel which is so far removed from my everyday DH practice.

Other stuff

Tweet: If you ask a library professional about digitisating (new word alert!) a specific collection and they appear to go quiet, this is actually what they're doing – digitisation takes shedloads of time and paperwork https://twitter.com/CamDigLib/status/1148888628405395456

Posters

@LibsDH ADHO Lib & DH SIG meetup

There was a lunchtime meeting for 'Libraries and Digital Humanities: an ADHO Special Interest Group', which was a lovely chance to talk libraries / GLAMs and DH. You can join the group via https://docs.google.com/forms/d/e/1FAIpQLSfswiaEnmS_mBTfL3Bc8fJsY5zxhY7xw0auYMCGY_2R0MT06w/viewform or the mailing list at http://lists.digitalhumanities.org/mailman/listinfo/libdh-sig

DH2019 Day 2, July 11

XR in DH: Extended Reality in the Digital Humanities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=523&presentations=show

Another panel where I enjoyed listening and learning about a field I haven't explored in depth. Tweet from the Q&A: 'Love the 'XR in DH: Extended Reality in the Digital Humanities' panel responses to a question about training students only for them to go off and get jobs in industry: good! Industry needs diversity, PhDs need to support multiple career paths beyond academia'

Data Science & Digital Humanities: new collaborations, new opportunities and new complexities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=532&presentations=show Beatrice Alex, Anne Alexander, David Beavan, Eirini Goudarouli, Leonardo Impett, Barbara McGillivray, Nora McGregor, Mia Ridge

My work with open cultural data has led to me asking 'how can GLAMs and data scientists collaborate to produce outcomes that are useful for both?'. Following this, I presented a short paper, more info at https://www.openobjects.org.uk/2019/07/in-search-of-the-sweet-spot-infrastructure-at-the-intersection-of-cultural-heritage-and-data-science/ https://www.slideshare.net/miaridge/in-search-of-the-sweet-spot-infrastructure-at-the-intersection-of-cultural-heritage-and-data-science.

As summarised in tweets:

  • https://twitter.com/semames1/status/1149250799232540672, 'data science can provide new routes into library collections; libraries can provide new challenging sources of information (scale, untidy data) for data scientists';
  • https://twitter.com/sp_meta/status/1149251010025656321 'library staff are often assessed by strict metrics of performance – items catalog, speed of delivery to reading room – that isn’t well-matched to messy, experimental collaborations with data scientists';
  • https://twitter.com/melissaterras/status/1149251480576303109 'Copyright issues are inescapable… they are the background noise to what we do';
  • https://twitter.com/sp_meta/status/1149251656720289792 'How can library infrastructure change to enable collaboration with data scientists, encouraging use of collections as data and prompting researchers to share their data and interpretations back?';
  • (me) 'I'm wondering about this dichotomy between 'new' or novel, and 'useful' or applied – is there actually a sweet spot where data scientists can work with DH / GLAMs or should we just apply data science methods and also offer collections for novel data science research? Thinking of it as a scale of different aspects of 'new to applied research' rather than a simple either/or'.

SP-19: Cultural Heritage, Art/ifacts and Institutions

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=462&presentations=show

“Un Manuscrit Naturellement ” Rescuing a library buried in digital sand

  • 1979, agreement with Ministry of Culture and IRHT to digitise all manuscripts stored in French public libraries. (Began with microfilm, not digital). Safe, but not usable. Financial cost of preserving 40TB of data was prohibitive, but BnF started converting TIFFs to JP2 which made storage financially feasible. Huge investment by France in data preservation for digitised manuscripts.
  • Big data cleaning and deduplication process, got rid of 1 million files. Discovered errors in TIFF when converting to JP2. Found inconsistencies with metadata between databases and files. 3 years to do the prep work and clean the data!
  • ‘A project which lasts for 40 years produces a lot of variabilities’. Needed a team, access to proper infrastructure; the person with memory of the project was key.

A Database of Islamic Scientific Manuscripts — Challenges of Past and Future

  • (Following on from the last paper, digital preservation takes continuous effort). Moving to RDF model based on CIDOC-CRM, standard triple store database, standard ResearchSpace/Metaphactory front end. Trying to separate the data from the software to make maintenance easier.

Analytical Edition Detection In Bibliographic Metadata; The Emerging Paradigm of Bibliographic Data Science

  • Tweet: Two solid papers on a database for Islamic Scientific Manuscripts and data science work with the ESTC (English Short Title Catalogue) plus reflections on the need for continuous investment in digital preservation. Back on familiar curatorial / #MuseTech ground!
  • Lahti – Reconciling / data harmonisation for early modern books is so complex that there are different researchers working on editions, authors, publishers, places

Syriac Persons, Events, and Relations: A Linked Open Factoid-based Prosopography

  • Prosopography and factoids. His project relies heavily on authority files that http://syriaca.org/ produces. Modelling factoids in TEI; usually it’s done in relational databases.
  • Prosopography used to be published as snippets of narrative text about people that enough information was available about
  • Factoid – a discrete piece of prosopographical information asserted in a primary source text and sourced to that text.
  • Person, event and relation factoids. Researcher attribution at the factoid level. Using TEI because (as markup around the text) it stays close to the primary source material; can link out to controlled vocabulary
  • Srophe app – an open source platform for cultural heritage data used to present their prosopographical data https://srophe.app/
  • Harold Short says how pleased he is to hear a project like that taking the approach they have; TEI wasn’t available as an option when they did the original work (seriously beautiful moment)
  • Why SNAP? ‘FOAF isn’t really good at describing relationships that have come about as a result of slave ownership’
  • More on factoid prosopography via Arianna Ciula https://factoid-dighum.kcl.ac.uk/

Day 3, July 12

Complexities in the Use, Analysis, and Representation of Historical Digital Periodicals

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=527&presentations=show

  • Torsten Roeder: Tracing debate about a particular work through German music magazines and daily newspapers. OCR and mass digitisation made it easier to compose representative text corpora about specific subjects. Authorship information isn’t available so don’t know their backgrounds etc, means a different form of analysis. ‘Horizontal reading’ as a metaphor for his approach. Topic modelling didn’t work for looking for music criticism.
  • Roeder's requirements: accessible digital copies of newspapers; reliable metadata; high quality OCR or transcriptions; article borders; some kind of segmentation; deep semantic annotation – ‘but who does what?’ What should collection holders / access providers do, and what should researchers do? (e.g. who should identify entities and concepts within texts? This question was picked up in other discussion in the session, on twitter and at an impromptu lunchtime meetup)
  • Zeg Segal. The Periodical as a Geographical Space. Relation between the two isn’t unidirectional. Imagined space constructed by the text and its layout. Periodicals construct an imaginary space that refers back to the real. Headlines, para text, regular text. Divisions between articles. His case study for exploring the issues: HaZefirah. (sample slide image https://twitter.com/mia_out/status/1149581497680052224)
  • Nanette Rißler-Pipka, Historical Periodicals Research, Opportunities and Limitations. The limitations she encounters as a researcher. Building a corpus of historical periodicals for a research question often means using sources from more than one provider of digitised texts. Different searches, rights, structure. (The need for multiple forms of interoperability, again)
  • Wants article / ad / genre classifications. For metadata wants, bibliographical data about the title (issue, date); extractable data (dates, names, tables of contents), provenance data (who digitised, when?). When you download individual articles, you lose the metadata which would be so useful for research. Open access is vital; interoperability is important; the ability to create individual collections across individual libraries is a wonderful dream
  • Estelle Bunout. Impresso providing exploration tools (integrate and decomplexify NLP tools in current historical research workflows). https://impresso-project.ch/app/#/
  • Working on: expanding a query – find neighbouring terms and frequent OCR errors. Overview of query: where and when is it? Whole corpus has been processed with topic modelling.
  • Complex queries: help me find the mention of places, countries, person in a particular thematic context. Can save to collection or export for further processing.
  • See the unsearchable: missing issues, failure to digitise issues, failure to OCRise, corrupt files
  • Transparency helps researchers discover novel opportunities and make informed decisions about sources.
  • Clifford Wulfman – how to support transcriptions, linked open data that allows exploration of notions of periodicity, notions of the periodical. My tweet: Clifford Wulfman acknowledging that libraries don't have the resources to support special 'snowflake' projects because they're working to meet the most common needs. IME this question/need doesn't go away so how best to tackle and support it?
  • Q&A comment: what if we just put all newspapers on Impresso? Discussion of standardisation, working jointly, collaborating internationally
  • Melodee Beals comments: libraries aren’t there just to support academic researchers, academics could look to supporting the work of creative industries, journalists and others to make it easier for libraries to support them.
  • Subject librarian from Leiden University points out that copyright limits their ability to share newspapers after 1880. (Innovating is hard when you can't even share the data)
  • Nanette Rißler says researchers don't need fancy interfaces, just access to the data (which probably contradicts the need for 'special snowflake' systems and explains why libraries can never ever make all users happy)

LP-34: Cultural Heritage, Art/ifacts and Institutions

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=516&presentations=show

(I was chairing so notes are sketchier)

  • Mark Hill, early modern (1500-1800 but 18thC in particular) definitions of ‘authorship’. How does authorship interact with structural aspects of publishing? Shift of authorship from gentlemanly to professional occupation.
  • Using the ESTC. Has about 1m actors, 400k documents with actors attached to them. Actors include authors, editors, publishers, printers, translators, dedicatees. Early modern print trade was ‘trade on a human scale’. People knew each other ‘hand-operated printing press required individual actors and relationships’.
  • As time goes on, printers work with fewer, publishers work with more people, authors work with about the same number of people.
  • They manually created a network of people associated with Bernard Mandeville and compared it with a network automatically generated from ESTC.
  • Looking at a work network for Edmond Hoyle’s Short Treatise on the Game of Whist. (Today I learned that Hoyle's Rules, determiner of victory in family card games and of 'according to Hoyle' fame, dates back to a book on whist in the 18thC)
  • (Really nice use of social network analysis to highlight changes in publisher and authorship networks.) Eigenvector very good at finding important actors. In the English Civil War, who you know does matter when it comes to publishing. By 18thC publishers really matter. See http://ceur-ws.org/Vol-2364/19_paper.pdf for more.

Richard Freedman, David Fiala, Andrew Janco et al

  • What is a musical quotation? Borrowing, allusion, parody, commonplace, contrafact, cover, plagiat, sampling, signifying.
  • Tweet: Freedman et al.'s slides for 'Citations: The Renaissance Imitation Mass (CRIM) and The Quotable Musical Text in a Digital Age' https://bit.ly/CRIM_Utrecht are a rich introduction to applications of #DigitalMusicology encoding and markup
  • I spend so much time in text worlds that it's really refreshing to hear from musicologists who play music to explain their work and place so much value on listening while also exploiting digital processing tools to the max

Digging Into Pattern Usage Within Jazz Improvisation (Pattern History Explorer, Pattern Search and Similarity Search) Frank Höger, Klaus Frieler, Martin Pfleiderer

Impromptu meetup to discuss issues raised around digitised newspapers research and infrastructure

See notes about DH2019 Lunch session – Researchers & Libraries working together on improving digitised newspapers. 20 or more people joined us for a discussion of the wonderful challenges and wish lists from speakers, thinking about how we can collaborate to improve the provision of digitised newspapers / periodicals for researchers.

Theorising the Spatial Humanities panel

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=539&presentations=show

  • ?? Space as a container for understanding, organising information. Chorography, the writing of the region.
  • Tweet: In the spatial humanities panel where a speaker mentions chorography, which along with prosopography is my favourite digital-history-enabled-but-also-old concept
  • Daniel Alves. Do history and literature researchers feel the need to incorporate spatial analysis in their work? A large number who do don’t use GIS. Most of them don’t believe in it (!). The rest are so tired that they prefer theorising (!!) His goal, ref last night keynote, is not to build models, tools, the next great algorithm; it’s to advance knowledge in his specific field.
  • Tweet: @DanielAlvesFCSH Is #SpatialDH revolutionary? Do history and literature researchers feel the need to incorporate spatial analysis in their work? A large number who do don’t use GIS. Most of them don’t believe in it(!). The rest are so tired that they prefer theorising(!!)
  • Tweet: @DanielAlvesFCSH close reading is still essential to take in the inner subjectivity of historical / literary sources with a partial and biases conception of space and place
  • Tien Danniau, Ghent Centre for Digital Humanities – deep maps. How is the concept working for them?
  • Tweet: Deep maps! A slide showing some of the findings from the 2012 NEH Advanced Institute on spatial narratives and deep mapping, which is where I met many awesome DH and spatial history people #DH2019pic.twitter.com/JiQepz7kH5
  • Katie McDonough, Spatial history between maps and texts: lessons from the 18thC. Refers to Richard White’s spatial history essay in her abstract. Rethinking geographic information extraction. Embedded entities, spatial relations, other stuff.
  • Tweet: @khetiwe24 references work discussed in https://www.tandfonline.com/doi/abs/10.1080/13658816.2019.1620235?journalCode=tgis20 … noting how the process of annotating texts requires close reading that changes your understanding of place in the text (echoing @DanielAlvesFCSH 's earlier point)
  • Tweet: Final #spatialDH talk 'towards spatial linguistics' #DH2019 https://twitter.com/mia_out/status/1149666605258829824
  • Tweet #DH2019 Preserving deep maps? I'd talk to folk in web archiving for a sense of which issues re recording complex, multi-format, dynamic items are tricky and which are more solveable

Closing keynote: Digital Humanities — Complexities of Sustainability, Johanna Drucker

(By this point my laptop and mental batteries were drained so I just listened and tweeted. I was also taking part in a conversation about the environmental sustainability of travel for conferences, issues with access to visas and funding, etc, that might be alleviated by better incorporating talks from remote presenters, or even having everyone present online.)

Finally, the DH2020 conference is calling for reviewers. Reviewing is an excellent way to give something back to the DH community while learning about the latest work as it appears in proposals, and perhaps more importantly, learning how to write a good proposal yourself. Find out more: http://dh2020.adho.org/cfps/reviewers/