Links for a talk on crowdsourcing at UCL

I'm giving a lecture on 'crowdsourcing at the British Library' for students on UCL's MSc Sustainable Heritage taking the course 'Crowd-Sourced and Citizen Data for Cultural Heritage' (BENV0114).

As some links at the British Library are still down, I've put the web archive versions into a post, along with other links included in my talk:

Crowdsourcing at the British Library https://web.archive.org/web/20230401000000*/https://www.bl.uk/projects/crowdsourcing-at-the-british-library

The Collective Wisdom Handbook: perspectives on crowdsourcing in cultural heritage https://britishlibrary.pubpub.org/

The Collective Wisdom project website https://collectivewisdomproject.org.uk/

The Collective Wisdom 'Recommendations, Challenges and Opportunities for the Future of Crowdsourcing in Cultural Heritage: a White Paper' https://collectivewisdomproject.org.uk/new-release-collective-wisdom-white-paper/ (commentable version: https://docs.google.com/document/d/1HNEshEmS01CIdM31vX68Tg90-CNVSY0R2zbd7EUaCIA/edit?usp=sharing)

Living with Machines on Zooniverse http://bit.ly/LivingWithMachines. Some background: https://livingwithmachines.ac.uk/why-is-the-communities-lab-asking-people-to-read-old-news/

https://bl.uk/digital https://web.archive.org/web/20230601060241/https://www.bl.uk/subjects/digital-scholarship

Finding Digital Heritage / GLAM tech / Digital Humanities jobs / staff

Technology job listings for cultural heritage or the humanities aren't always easy to find. I've recently been helping recruit into various tech roles at the British Library while also answering questions from folk looking for work in the digital heritage / GLAM (galleries, libraries, archives, museums) tech / digital humanities (DH)-ish world, so I've collated some notes on where to look for job ads. Plus, some bonus thoughts on preparing for a job search and applying for jobs.

Preparing for a job search / post

If you're looking for work, setting up alerts or subscribing to various job sites can give you a sense of what's out there, and the skills and language you'd want to include in your CV or portfolio. If you're going to advertise vacancies, it helps to get a sense of how others describe their jobs.

Lurking on slacks and mailing lists gives you exposure to local jargon. Even better, if you can post occasionally to help someone with a question, as people might recognise your name later. Events – meetups, conferences, seminars, etc – can be good for meeting people and learning more about a sector. Serendipitous casual chats are easier in-person, but online events are more accessible.

Applying for GLAM/DH jobs

You probably know this, but sometimes a reminder helps… it's often worth applying for a job where you have most, but not all of the required skills. Job profiles are often wish lists rather than complete specs. That said, pay attention to the language used around different 'essential' vs 'desirable' requirements as that can save you some time.

Please, please pay attention to the questions asked during the application process and figure out (or ask) how they're shortlisting based on the questions they ask. At the BL we can only shortlist with information that applicants provide in response to questions on the application. In other places, reflecting the language and specific requirements in the job ad and profile in your cover letter matters more. And I'm sorry if you've spent ages on it, but never assume that people can see your CV during the shortlisting or interview process.

If you see a technology or method that you haven't tried, getting familiar with it before an interview can take you a long way. Download and try it, watch videos, whatever – showing willing and being able to relate it to your stronger skills helps.

Speaking of interviews, these interview tips might help, particularly preparing potential answers using the STAR (situation, task, action, result) method.

The UK GLAM sector tends not to be able to offer visa sponsorship, but remote contracts may be possible. Always read the fine print…

Translate job descriptions and profiles to help candidates understand your vacancy

Updating to add: public organisations often have obscure job titles and descriptions. You can help translate jargon and public / charity / arts / academic sector speak into something closer to the language potential candidates might understand by writing blog posts and social media / discussion list messages that explain what the job actually involves, why it exists, and what a typical day or week might look like.

Cultural Heritage/Digital Humanities Slacks – most of these have jobs channels

GLAM/DH Mailing lists

Museum Computer Network (MCN) – mostly US
Museums Computer Group (MCG) – mostly UK
GLAM Labs
Code4Lib discussion list
DHumanist
Is there something similar for technologists in archiving? Let me know!

Job sites with GLAM/DH vacancies

Jobs.ac.uk
Code4Lib jobs
Digital Library Federation (DLF) jobs
Digital Preservation Coalition jobs board (thanks Angela Puggioni for the suggestion)
Apparently you can post 'briefs, invitations to tender and freelance opportunities to Culture Briefs' for mailing to consultants, freelancers and agencies specialising in the arts, heritage and culture sectors in the UK – or conversely, sign up for updates.

'Resonating with different frequencies' – notes for a talk on the Le Show archive

I met dr. rosa a. eberly, associate professor of rhetoric at Pennsylvania State University when she took my and Thomas Padilla's 'Collections as Data' course at the HILT summer school in 2018. When she got in touch to ask if I could contribute to a workshop on Harry Shearer's Le Show archive, of course I said yes! That event became the CAS 2023 Summer Symposium on Harry Shearer's "Le Show".

My slides for 'Resonating with different frequencies… Thoughts on public humanities through crowdsourcing in a ChatGPT world' are online at Zenodo. My planned talk notes are below.

Banner from Harry Shearer's Le Show archive, featuring a photo of Shearer. Text says 'Vogue magazine describes Le Show as "wildly clever,
iconoclastic stew of talk, music, political commentary,
readings of inadvertently funny public documents or
trade magazines and scripted skits."'

Opening – I’m sorry I can’t be in the room today, not least because the programme lists so many interesting talks.

Today I wanted to think about the different ways that public humanities work through crowdsourcing still has a place in an AI-obsessed world… what happens if we think about different ways of ‘listening’ to an audio archive like Le Show, by people, by machines, and by people and machines in combination?

What visions can we create for a future in which people and machines tune into different frequencies, each doing what they do best?

Overview

My work in crowdsourcing / data science in GLAMs
What can machines do?
The Le Show archive (as described by Rosa)
Why do we still need people listening to Le Show and other audio archives?

My current challenge is working out the role of crowdsourcing when 'AI can do it all'…

Of course AI can't, but we need to articulate what people and what machines can do so that we can set up systems that align with our values.

If we leave it to the commercial sector and pure software guys, there’s a risk that people are regarded as part of the machine; or are replaced by AI rather than aided by AI.

[Then I did a general 'crowdsourcing and data science in cultural heritage / British Library / Living with Machines' bit]

Given developments in 'AI' (machine learning)… What can AI/data science do for audio?

Transcribe speech for text-based search, methods
Detect some concepts, entities, emotions –> metadata for findability
Support 'distant reading'

–Shifts, motifs, patterns over time

–Collapse hours, years – take time out of the equation

Machine listening?

–Use 'similarity' to find sonic (not text) matches?

[Description of the BBC World Archive experiments c 2012 combining crowdsourcing with early machine learning https://www.bbc.co.uk/blogs/researchanddevelopment/2012/11/the-world-service-archive-prot.shtml]

Le Show (as described by Rosa)

A massive 'portal' of 'conceptual and sonic hyperlinks to late-20th- and early-21st-century news and culture'
A 'polyphonic cornucopia of words and characters, lyrics and arguments, fact and folly'
'resistant to datafication'
With koine topoi – issues of common or public concern

'Harry Shearer is a portal: Learn one thing from Le Show, and you’ll quickly learn half a dozen more by logical consequence'
dr. rosa a. eberly

(Le Show reminds me of a time when news was designed to inform more than enrage.)

Why let machines have all the fun?

People can hear a richer range of emotions, topics and references, recognise impersonations and characters -> better metadata, findability

What can’t machines do? Software might be able to transcribe speech with pretty high accuracy, but it can't (reliably)… recognise humour, sarcasm, rhetorical flourishes, impersonations and characters – all the wonderful characteristics of the Le Show archive that Rosa described in her opening remarks yesterday. A lot of emotions aren’t covered in the ‘big 8’ that software tries to detect.

Software can recognise some subjects that e.g. have Wikipedia entries, but it’d also miss so much of what people can hear.

So, people can do a better job of telling us what's in the archive than computers can. Together, people and computers can help make specific moments more findable, creates metadata that could be used to visualise links between shows – by topic, by tone, music and more.

A 1950s black and white photo of a man knitting while relaxing in an armchair — Jay Yoder with His Knitting

Could access to history in the raw, 'koine topoi' be a super-power?

Individual learning via crowdsourcing contributes to an informed, literate society

It's not all about the data. Crowdsourcing creates a platform and a reason for engagement. Your work helps others, but it also helps you.

I've shown some of my work with objects from the history of astronomy; playbills for 19th c British theatre performances, and most recently, newspaper articles from the long 19th c.

Through this work, I've come to believe that giving people access to original historical sources is one of the most important ways we can contribute to an informed, literate society.

A society that understands where we've come from, and what that means for where we're going.

A society that is less likely to fall for predictions of AI dooms or AI fantasies, because they've seen tech hype before.

A society that is less likely to believe that 'AI might take your job' because they know that the executives behind the curtain are the ones deciding whether AI helps workers or 'replaces' them.

I've worried about whether volunteers would be motivated to help transcribe audio or text, classify or tag images, when 'AI can do it'. But then I remembered that people still knit jumpers (sweaters) when they can buy them far more quickly and cheaply.

So, crowdsourcing still has a place. The trick is to find ways for 'AI' to aid people, not replace them. To figure out the boring bits and the bits that software is great at; so that people can spend more time on the fun bits.

Harry Shearer's ability to turn something into a topic, 'news of microplastics', of bees', is something of a super power. To amplify those messages is another gift, one the public can create by and for themselves.

National approaches to crowdsourcing / citizen science?

This is a 'work in progress' post that I hope to add to as I gather information about national portals for crowdsourcing / citizen science / citizen history and other forms of voluntary digital / online participation.

While portals like SciStarter, Crowds4U and platforms like Zooniverse, FromThePage, HistoryPin etc are a great way to search across projects for something that matches your interests, I'm interested in the growth of national portals or indexes to projects (they might also be called 'project finders'). It's not so much the sites themselves that interest me as the underlying networks of regional communities of practice, national or regional infrastructure and other signs of national support that they might variously reflect or help create. If you're interested in specific projects outside the UK-US/English-language bubble, check out Crowdsourcing the world's heritage. I've also shared a 2015 list of 'participatory digital heritage sites' that includes many crowdsourcing sites.

If you know of a national portal or umbrella organisation for crowdsourcing, please drop me a line! Last updated: Feb 7, 2023.

Austria

Jan Smeddinck emailed to share the LBG Open Innovation in Science Center https://ois.lbg.ac.at/

Brazil

Lesandro Ponciano nominated 'Civis, which is the Brazilian Citizen Science platform. The link is https://civis.ibict.br/ Civis was built by using the same software developed by Ibercivis in Spain for the eu-citizen.science platform. Civis was launched in 2022 – the event (in Portuguese) is recorded on YouTube at
https://www.youtube.com/live/_nPqmcq0gos '

Canada

The Canadian Citizen Science portal

France

This post was inspired by the apparently coordinated approach in France. The Archives nationales participatives site has 'Projets collaboratifs de transcriptions, annotations et indexations' – that is, participatory national archives with collaborative transcription, annotation and indexing projects.

They also have Le réseau Particip-Arc, a 'network of actors committed to participatory science in the fields of culture', supported by the Ministry of Culture and coordinated by the National Museum of Natural History.

European Union

EU-citizen.science is a 'platform for sharing citizen science projects, resources, tools, training and much more'.

Germany / German-language projects

The German / German-language citizen science portal

Netherlands

Alastair Dunning pointed to the Citizen Science network, run by @CitSciLab (Margaret Gold).

Norway

Agata Bochynska said, 'Norway has recently formed a national network for citizen science that’s coordinated by Research Council of Norway' – Nasjonalt nettverk for folkeforskning (folkeforskning translates as 'folk research' according to Google).

Scotland

The Scottish Citizen Science portal

Slovenia

https://citizenscience.si/ lists current and completed citizen science projects in Slovenia, infrastructure available to support projects, and events and other activities. Hat tip Mitja V. Iskrić on mastodon.

Sweden

David Haskiya reports: 'medborgarforskning.se/ Provides an intro to citizen science, a catalogue of Swedish projects, etc. Seems to be part of an EU-network of such sites. Summary in English here https://medborgarforskning.se/eng/'

A Swedish national hub for everyone interested in citizen science (medborgarforskning). The project was funded by Vinnova – Sweden’s innovation agency, the University of Gothenburg, the Swedish University of Agricultural Sciences, Umeå University.

United Kingdom

gov.uk lists some volunteering portals but they don't make it easy to find online-only opportunities.

United Nations

https://app.unv.org/ lists online and on-site (i.e. in-person) opportunities around the world, although some of them might stretch the definition of 'voluntary roles'.

Wales

Rita Singer reports: 'In Wales, we have the People's Collection, which functions as a citizen archive of Wales' history and heritage.' https://www.peoplescollection.wales/

Introducing… The Collective Wisdom Handbook

I'm delighted to share my latest publication, a collaboration with 15 co-authors written in March and April 2021. It's the major output of my Collective Wisdom project, an AHRC-funded project I lead with Meghan Ferriter and Sam Blickhan.

Until August 9, 2021, you can provide feedback or comment on The Collective Wisdom Handbook: perspectives on crowdsourcing in cultural heritage:

We have published this first version of our collaborative text to provide early access to our work, and to invite comment and discussion from anyone interested in crowdsourcing, citizen science, citizen history, digital / online volunteer projects, programmes, tools or platforms with cultural heritage collections.

I wrote two posts to provide further context:

Our book is now open for 'community review'. What does that mean for you?

Announcing an 'early access' version of our Collective Wisdom Handbook

I'm curious to see how much of a difference this period of open comment makes. The comments so far have been quite specific and useful, but I'd like to know where we *really* got it right, and where we could include other examples. You need a pubpub account to comment but after that it's pretty straightforward – select text, and add a comment, or comment on an entire chapter.

Having some distance from the original writing period has been useful for me – not least, the realisation that the title should have been 'perspectives on crowdsourcing in cultural heritage and digital humanities'.

About 'a practical guide to crowdsourcing in cultural heritage'

Some time ago I wrote a chapter on 'Crowdsourcing in cultural heritage: a practical guide to designing and running successful projects' for the Routledge International Handbook of Research Methods in Digital Humanities, edited by Kristen Schuster and Stuart Dunn. As their blurb says, the volume 'draws on both traditional and emerging fields of study to consider what a grounded definition of quantitative and qualitative research in the Digital Humanities (DH) might mean; which areas DH can fruitfully draw on in order to foster and develop that understanding; where we can see those methods applied; and what the future directions of research methods in Digital Humanities might look like'.

Inspired by a post from the authors of a chapter in the same volume (Opening the ‘black box’ of digital cultural heritage processes: feminist digital humanities and critical heritage studies by Hannah Smyth, Julianne Nyhan & Andrew Flinn), I'm sharing something about what I wanted to do in my chapter.

As the title suggests, I wanted to provide practical insights for cultural heritage and digital humanities practitioners. Writing for a Handbook of Research Methods in Digital Humanities was an opportunity help researchers understand both how to apply the 'method' and how the 'behind the scenes' work affects the outcomes. As a method, crowdsourcing in cultural heritage touches on many more methods and disciplines. The chapter built on my doctoral research, and my ideas were roadtested at many workshops, classes and conferences.

Rather than crib from my introduction (which you can read in a pre-edited version online), I've included the headings from the chapter as a guide to the contents:

An introduction to crowdsourcing in cultural heritage
Key conceptual and research frameworks
Fundamental concepts in cultural heritage crowdsourcing
Why do cultural heritage institutions support crowdsourcing projects?
Why do people contribute to crowdsourcing projects?
Turning crowdsourcing ideas into reality
Planning crowdsourcing projects
Defining 'success' for your project
Managing organisational impact
Choosing source collections
Planning workflows and data re-use
Planning communications and participant recruitment
Final considerations: practical and ethical ‘reality checks’
Developing and testing crowdsourcing projects
Designing the ‘onboarding’ experience
Task design
Documentation and tutorials
Quality control: validation and verification systems
Rewards and recognition
Running crowdsourcing projects
Launching a project
The role of participant discussion
Ongoing community engagement
Planning a graceful exit
The future of crowdsourcing in cultural heritage
Thanks and acknowledgements

I wrote in the open on this Google Doc: 'Crowdsourcing in cultural heritage: a practical guide to designing and running successful projects', and benefited from the feedback I got during that process, so this post is also an opportunity to highlight and reiterate my 'Thanks and acknowledgements' section:

I would like to thank participants and supporters of crowdsourcing projects I’ve created, including Museum Metadata Games, In their own words: collecting experiences of the First World War, and In the Spotlight. I would also like to thank my co-organisers and attendees at the Digital Humanities 2016 Expert Workshop on the future of crowdsourcing. Especial thanks to the participants in courses and workshops on ‘crowdsourcing in cultural heritage’, including the British Library’s Digital Scholarship training programme, the HILT Digital Humanities summer school (once with Ben Brumfield) and scholars at other events where the course was held, whose insights, cynicism and questions have informed my thinking over the years. Finally, thanks to Meghan Ferriter and Victoria Van Hyning for their comments on this manuscript.

References for Crowdsourcing in cultural heritage: a practical guide to designing and running successful projects

Alam, S. L., & Campbell, J. (2017). Temporal Motivations of Volunteers to Participate in Cultural Crowdsourcing Work. Information Systems Research. https://doi.org/10.1287/isre.2017.0719

Bedford, A. (2014, February 16). Instructional Overlays and Coach Marks for Mobile Apps. Retrieved 12 September 2014, from Nielsen Norman Group website: http://www.nngroup.com/articles/mobile-instructional-overlay/

Berglund Prytz, Y. (2013, June 24). The Oxford Community Collection Model. Retrieved 22 October 2018, from RunCoCo website: http://blogs.it.ox.ac.uk/runcoco/2013/06/24/the-oxford-community-collection-model/

Bernstein, S. (2014). Crowdsourcing in Brooklyn. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Bitgood, S. (2010). An attention-value model of museum visitors (pp. 1–29). Retrieved from Center for the Advancement of Informal Science Education website: http://caise.insci.org/uploads/docs/VSA_Bitgood.pdf

Bonney, R., Ballard, H., Jordan, R., McCallie, E., Phillips, T., Shirk, J., & Wilderman, C. C. (2009). Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report (pp. 1–58). Retrieved from Center for Advancement of Informal Science Education (CAISE) website: http://caise.insci.org/uploads/docs/PPSR%20report%20FINAL.pdf

Brohan, P. (2012, July 23). One million, six hundred thousand new observations. Retrieved 30 October 2012, from Old Weather Blog website: http://blog.oldweather.org/2012/07/23/one-million-six-hundred-thousand-new-observations/

Brohan, P. (2014, August 18). In search of lost weather. Retrieved 5 September 2014, from Old Weather Blog website: http://blog.oldweather.org/2014/08/18/in-search-of-lost-weather/

Brumfield, B. W. (2012a, March 5). Quality Control for Crowdsourced Transcription. Retrieved 9 October 2013, from Collaborative Manuscript Transcription website: http://manuscripttranscription.blogspot.co.uk/2012/03/quality-control-for-crowdsourced.html

Brumfield, B. W. (2012b, March 17). Crowdsourcing at IMLS WebWise 2012. Retrieved 8 September 2014, from Collaborative Manuscript Transcription website: http://manuscripttranscription.blogspot.com.au/2012/03/crowdsourcing-at-imls-webwise-2012.html

Budiu, R. (2014, March 2). Login Walls Stop Users in Their Tracks. Retrieved 7 March 2014, from Nielsen Norman Group website: http://www.nngroup.com/articles/login-walls/

Causer, T., & Terras, M. (2014). ‘Many Hands Make Light Work. Many Hands Together Make Merry Work’: Transcribe Bentham and Crowdsourcing Manuscript Collections. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Causer, T., & Wallace, V. (2012). Building A Volunteer Community: Results and Findings from Transcribe Bentham. Digital Humanities Quarterly, 6(2). Retrieved from http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html

Cheng, J., Teevan, J., Iqbal, S. T., & Bernstein, M. S. (2015, April). Break It Down: A Comparison of Macro- and Microtasks. 4061–4064. https://doi.org/10.1145/2702123.2702146

Clary, E. G., Snyder, M., Ridge, R. D., Copeland, J., Stukas, A. A., Haugen, J., & Miene, P. (1998). Understanding and assessing the motivations of volunteers: A functional approach. Journal of Personality and Social Psychology, 74(6), 1516–30.

Collings, R. (2014, May 5). The art of computer image recognition. Retrieved 25 May 2014, from The Public Catalogue Foundation website: http://www.thepcf.org.uk/what_we_do/48/reference/862

Collings, R. (2015, February 1). The art of computer recognition. Retrieved 22 October 2018, from Art UK website: https://artuk.org/about/blog/the-art-of-computer-recognition

Crowdsourcing Consortium. (2015). Engaging the Public: Best Practices for Crowdsourcing Across the Disciplines. Retrieved from http://crowdconsortium.org/

Crowley, E. J., & Zisserman, A. (2016). The Art of Detection. Presented at the Workshop on Computer Vision for Art Analysis, ECCV. Retrieved from https://www.robots.ox.ac.uk/~vgg/publications/2016/Crowley16/crowley16.pdf

Csikszentmihalyi, M., & Hermanson, K. (1995). Intrinsic Motivation in Museums: Why Does One Want to Learn? In J. Falk & L. D. Dierking (Eds.), Public institutions for personal learning: Establishing a research agenda (pp. 66–77). Washington D.C.: American Association of Museums.

Dafis, L. L., Hughes, L. M., & James, R. (2014). What’s Welsh for ‘Crowdsourcing’? Citizen Science and Community Engagement at the National Library of Wales. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Das Gupta, V., Rooney, N., & Schreibman, S. (n.d.). Notes from the Transcription Desk: Modes of engagement between the community and the resource of the Letters of 1916. Digital Humanities 2016: Conference Abstracts. Presented at the Digital Humanities 2016, Kraków. Retrieved from http://dh2016.adho.org/abstracts/228

De Benetti, T. (2011, June 16). The secrets of Digitalkoot: Lessons learned crowdsourcing data entry to 50,000 people (for free). Retrieved 9 January 2012, from Microtask website: http://blog.microtask.com/2011/06/the-secrets-of-digitalkoot-lessons-learned-crowdsourcing-data-entry-to-50000-people-for-free/

de Boer, V., Hildebrand, M., Aroyo, L., De Leenheer, P., Dijkshoorn, C., Tesfa, B., & Schreiber, G. (2012). Nichesourcing: Harnessing the power of crowds of experts. Proceedings of the 18th International Conference on Knowledge Engineering and Knowledge Management, EKAW 2012, 16–20. Retrieved from http://dx.doi.org/10.1007/978-3-642-33876-2_3

DH2016 Expert Workshop. (2016, July 12). DH2016 Crowdsourcing workshop session overview. Retrieved 5 October 2018, from DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? website: https://docs.google.com/document/d/1sTII8P67mOFKWxCaAKd8SeF56PzKcklxG7KDfCRUF-8/edit?usp=drive_open&ouid=0&usp=embed_facebook

Dillon-Scott, P. (2011, March 31). How Europeana, crowdsourcing & wiki principles are preserving European history. Retrieved 15 February 2015, from The Sociable website: http://sociable.co/business/how-europeana-crowdsourcing-wiki-principles-are-preserving-european-history/

DiMeo, M. (2014, February 3). First Monday Library Chat: University of Iowa’s DIY History. Retrieved 7 September 2014, from The Recipes Project website: http://recipes.hypotheses.org/3216

Dunn, S., & Hedges, M. (2012). Crowd-Sourcing Scoping Study: Engaging the Crowd with Humanities Research (p. 56). Retrieved from King’s College website: http://www.humanitiescrowds.org

Dunn, S., & Hedges, M. (2013). Crowd-sourcing as a Component of Humanities Research Infrastructures. International Journal of Humanities and Arts Computing, 7(1–2), 147–169. https://doi.org/10.3366/ijhac.2013.0086

Durkin, P. (2017, September 28). Release notes: A big antedating for white lie – and introducing Shakespeare’s world. Retrieved 29 September 2017, from Oxford English Dictionary website: http://public.oed.com/the-oed-today/recent-updates-to-the-oed/september-2017-update/release-notes-white-lie-and-shakespeares-world/

Eccles, K., & Greg, A. (2014). Your Paintings Tagger: Crowdsourcing Descriptive Metadata for a National Virtual Collection. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Edwards, D., & Graham, M. (2006). Museum volunteers and heritage sectors. Australian Journal on Volunteering, 11(1), 19–27.

European Citizen Science Association. (2015). 10 Principles of Citizen Science. Retrieved from https://ecsa.citizen-science.net/sites/default/files/ecsa_ten_principles_of_citizen_science.pdf

Eveleigh, A., Jennett, C., Blandford, A., Brohan, P., & Cox, A. L. (2014). Designing for dabblers and deterring drop-outs in citizen science. 2985–2994. https://doi.org/10.1145/2556288.2557262

Eveleigh, A., Jennett, C., Lynn, S., & Cox, A. L. (2013). I want to be a captain! I want to be a captain!: Gamification in the old weather citizen science project. Proceedings of the First International Conference on Gameful Design, Research, and Applications, 79–82. Retrieved from http://dl.acm.org/citation.cfm?id=2583019

Ferriter, M., Rosenfeld, C., Boomer, D., Burgess, C., Leachman, S., Leachman, V., … Shuler, M. E. (2016). We learn together: Crowdsourcing as practice and method in the Smithsonian Transcription Center. Collections, 12(2), 207–225. https://doi.org/10.1177/155019061601200213

Fleet, C., Kowal, K., & Přidal, P. (2012). Georeferencer: Crowdsourced Georeferencing for Map Library Collections. D-Lib Magazine, 18(11/12). https://doi.org/10.1045/november2012-fleet

Forum posters. (2010, present). Signs of OW addiction … Retrieved 11 April 2014, from Old Weather Forum » Shore Leave » Dockside Cafe website: http://forum.oldweather.org/index.php?topic=1432.0

Fugelstad, P., Dwyer, P., Filson Moses, J., Kim, J. S., Mannino, C. A., Terveen, L., & Snyder, M. (2012). What Makes Users Rate (Share, Tag, Edit…)? Predicting Patterns of Participation in Online Communities. Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, 969–978. Retrieved from http://dl.acm.org/citation.cfm?id=2145349

Gilliver, P. (2012, October 4). ‘Your dictionary needs you’: A brief history of the OED’s appeals to the public. Retrieved from Oxford English Dictionary website: https://public.oed.com/history/history-of-the-appeals/

Goldstein, D. (1994). ‘Yours for Science’: The Smithsonian Institution’s Correspondents and the Shape of Scientific Community in Nineteenth-Century America. Isis, 85(4), 573–599.

Grayson, R. (2016). A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Army’s Day-to-Day Life on the Western Front. British Journal for Military History, 2(2). Retrieved from http://bjmh.org.uk/index.php/bjmh/article/view/96

Hess, W. (2010, February 16). Onboarding: Designing Welcoming First Experiences. Retrieved 29 July 2014, from UX Magazine website: http://uxmag.com/articles/onboarding-designing-welcoming-first-experiences

Holley, R. (2009). Many Hands Make Light Work: Public Collaborative OCR Text Correction in Australian Historic Newspapers (No. March). Canberra: National Library of Australia.

Holley, R. (2010). Crowdsourcing: How and Why Should Libraries Do It? D-Lib Magazine, 16(3/4). https://doi.org/10.1045/march2010-holley

Holmes, K. (2003). Volunteers in the heritage sector: A neglected audience? International Journal of Heritage Studies, 9(4), 341–355. https://doi.org/10.1080/1352725022000155072

Kittur, A., Nickerson, J. V., Bernstein, M., Gerber, E., Shaw, A., Zimmerman, J., … Horton, J. (2013). The future of crowd work. Proceedings of the 2013 Conference on Computer Supported Cooperative Work, 1301–1318. Retrieved from http://dl.acm.org/citation.cfm?id=2441923

Lambert, S., Winter, M., & Blume, P. (2014, March 26). Getting to where we are now. Retrieved 4 March 2015, from 10most.org.uk website: http://10most.org.uk/content/getting-where-we-are-now

Lascarides, M., & Vershbow, B. (2014). What’s on the menu?: Crowdsourcing at the New York Public Library. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Latimer, J. (2009, February 25). Letter in the Attic: Lessons learnt from the project. Retrieved 17 April 2014, from My Brighton and Hove website: http://www.mybrightonandhove.org.uk/page/letterintheatticlessons?path=0p116p1543p

Lazy Registration design pattern. (n.d.). Retrieved 9 December 2018, from Http://ui-patterns.com/patterns/LazyRegistration website: http://ui-patterns.com/patterns/LazyRegistration

Leon, S. M. (2014). Build, Analyse and Generalise: Community Transcription of the Papers of the War Department and the Development of Scripto. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38(1), 43–52.

McGonigal, J. (n.d.). Gaming the Future of Museums. Retrieved from http://www.slideshare.net/avantgame/gaming-the-future-of-museums-a-lecture-by-jane-mcgonigal-presentation#text-version

Mills, E. (2017, December). The Flitch of Bacon: An Unexpected Journey Through the Collections of the British Library. Retrieved 17 August 2018, from British Library Digital Scholarship blog website: http://blogs.bl.uk/digital-scholarship/2017/12/the-flitch-of-bacon-an-unexpected-journey-through-the-collections-of-the-british-library.html

Mitra, T., & Gilbert, E. (2014). The Language that Gets People to Give: Phrases that Predict Success on Kickstarter. Retrieved from http://comp.social.gatech.edu/papers/cscw14.crowdfunding.mitra.pdf

Mugar, G., Østerlund, C., Hassman, K. D., Crowston, K., & Jackson, C. B. (2014). Planet Hunters and Seafloor Explorers: Legitimate Peripheral Participation Through Practice Proxies in Online Citizen Science. Retrieved from http://crowston.syr.edu/sites/crowston.syr.edu/files/paper_revised%20copy%20to%20post.pdf

Mugar, G., Østerlund, C., Jackson, C. B., & Crowston, K. (2015). Being Present in Online Communities: Learning in Citizen Science. Proceedings of the 7th International Conference on Communities and Technologies, 129–138. https://doi.org/10.1145/2768545.2768555

Museums, Libraries and Archives Council. (2008). Generic Learning Outcomes. Retrieved 8 September 2014, from Inspiring Learning website: http://www.inspiringlearningforall.gov.uk/toolstemplates/genericlearning/

National Archives of Australia. (n.d.). ArcHIVE – homepage. Retrieved 18 June 2014, from ArcHIVE website: http://transcribe.naa.gov.au/

Nielsen, J. (1995). 10 Usability Heuristics for User Interface Design. Retrieved 29 April 2014, from http://www.nngroup.com/articles/ten-usability-heuristics/

Nov, O., Arazy, O., & Anderson, D. (2011). Technology-Mediated Citizen Science Participation: A Motivational Model. Proceedings of the AAAI International Conference on Weblogs and Social Media. Presented at the Barcelona, Spain. Barcelona, Spain.

Oomen, J., Gligorov, R., & Hildebrand, M. (2014). Waisda?: Making Videos Findable through Crowdsourced Annotations. In M. Ridge (Ed.), Crowdsourcing Our Cultural Heritage. Retrieved from http://www.ashgate.com/isbn/9781472410221

Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive Load Theory and Instructional Design: Recent Developments. Educational Psychologist, 38(1), 1–4. https://doi.org/10.1207/S15326985EP3801_1

Part I: Building a Great Project. (n.d.). Retrieved 9 December 2018, from Zooniverse Help website: https://help.zooniverse.org/best-practices/1-great-project/

Preist, C., Massung, E., & Coyle, D. (2014). Competing or aiming to be average?: Normification as a means of engaging digital volunteers. Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, 1222–1233. https://doi.org/10.1145/2531602.2531615

Raddick, M. J., Bracey, G., Gay, P. L., Lintott, C. J., Murray, P., Schawinski, K., … Vandenberg, J. (2010). Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers. Astronomy Education Review, 9(1), 18.

Raimond, Y., Smethurst, M., & Ferne, T. (2014, September 15). What we learnt by crowdsourcing the World Service archive. Retrieved 15 September 2014, from BBC R&D website: http://www.bbc.co.uk/rd/blog/2014/08/data-generated-by-the-world-service-archive-experiment-draft

Reside, D. (2014). Crowdsourcing Performing Arts History with NYPL’s ENSEMBLE. Presented at the Digital Humanities 2014. Retrieved from http://dharchive.org/paper/DH2014/Paper-131.xml

Ridge, M. (2011a). Playing with Difficult Objects – Game Designs to Improve Museum Collections. In J. Trant & D. Bearman (Eds.), Museums and the Web 2011: Proceedings. Retrieved from http://www.museumsandtheweb.com/mw2011/papers/playing_with_difficult_objects_game_designs_to

Ridge, M. (2011b). Playing with difficult objects: Game designs for crowdsourcing museum metadata (MSc Dissertation, City University London). Retrieved from http://www.miaridge.com/my-msc-dissertation-crowdsourcing-games-for-museums/

Ridge, M. (2013). From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing. Curator: The Museum Journal, 56(4).

Ridge, M. (2014, November). Citizen History and its discontents. Presented at the IHR Digital History Seminar, Institute for Historical Research, London. Retrieved from https://hcommons.org/deposits/item/hc:17907/

Ridge, M. (2015). Making digital history: The impact of digitality on public participation and scholarly practices in historical research (Ph.D., Open University). Retrieved from http://oro.open.ac.uk/45519/

Ridge, M. (2018). British Library Digital Scholarship course 105: Exercises for Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions. Retrieved from https://docs.google.com/document/d/1tx-qULCDhNdH0JyURqXERoPFzWuCreXAsiwHlUKVa9w/

Rotman, D., Preece, J., Hammock, J., Procita, K., Hansen, D., Parr, C., … Jacobs, D. (2012). Dynamic changes in motivation in collaborative citizen-science projects. Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, 217–226. https://doi.org/10.1145/2145204.2145238

Sample Ward, A. (2011, May 18). Crowdsourcing vs Community-sourcing: What’s the difference and the opportunity? Retrieved 6 January 2013, from Amy Sample Ward’s Version of NPTech website: http://amysampleward.org/2011/05/18/crowdsourcing-vs-community-sourcing-whats-the-difference-and-the-opportunity/

Schmitt, J. R., Wang, J., Fischer, D. A., Jek, K. J., Moriarty, J. C., Boyajian, T. S., … Socolovsky, M. (2014). Planet Hunters. VI. An Independent Characterization of KOI-351 and Several Long Period Planet Candidates from the Kepler Archival Data. The Astronomical Journal, 148(2), 28. https://doi.org/10.1088/0004-6256/148/2/28

Secord, A. (1994). Corresponding interests: Artisans and gentlemen in nineteenth-century natural history. The British Journal for the History of Science, 27(04), 383–408. https://doi.org/10.1017/S0007087400032416

Shakespeare’s World Talk #OED. (Ongoing). Retrieved 21 April 2019, from https://www.zooniverse.org/projects/zooniverse/shakespeares-world/talk/239

Sharma, P., & Hannafin, M. J. (2007). Scaffolding in technology-enhanced learning environments. Interactive Learning Environments, 15(1), 27–46. https://doi.org/10.1080/10494820600996972

Shirky, C. (2011). Cognitive surplus: Creativity and generosity in a connected age. London, U.K.: Penguin.

Silvertown, J. (2009). A new dawn for citizen science. Trends in Ecology & Evolution, 24(9), 467–71. https://doi.org/10.1016/j.tree.2009.03.017

Simmons, B. (2015, August 24). Measuring Success in Citizen Science Projects, Part 2: Results. Retrieved 28 August 2015, from Zooniverse website: https://blog.zooniverse.org/2015/08/24/measuring-success-in-citizen-science-projects-part-2-results/

Simon, N. K. (2010). The Participatory Museum. Retrieved from http://www.participatorymuseum.org/chapter4/

Smart, P. R., Simperl, E., & Shadbolt, N. (2014). A Taxonomic Framework for Social Machines. In D. Miorandi, V. Maltese, M. Rovatsos, A. Nijholt, & J. Stewart (Eds.), Social Collective Intelligence: Combining the Powers of Humans and Machines to Build a Smarter Society. Retrieved from http://eprints.soton.ac.uk/362359/

Smithsonian Institution Archives. (2012, March 21). Meteorology. Retrieved 25 November 2017, from Smithsonian Institution Archives website: https://siarchives.si.edu/history/featured-topics/henry/meteorology

Springer, M., Dulabahn, B., Michel, P., Natanson, B., Reser, D., Woodward, D., & Zinkham, H. (2008). For the Common Good: The Library of Congress Flickr Pilot Project (pp. 1–55). Retrieved from Library of Congress website: http://www.loc.gov/rr/print/flickr_report_final.pdf

Stebbins, R. A. (1997). Casual leisure: A conceptual statement. Leisure Studies, 16(1), 17–25. https://doi.org/10.1080/026143697375485

The Culture and Sport Evidence (CASE) programme. (2011). Evidence of what works: Evaluated projects to drive up engagement (No. January; p. 19). Retrieved from Culture and Sport Evidence (CASE) programme website: http://www.culture.gov.uk/images/research/evidence_of_what_works.pdf

Trant, J. (2009). Tagging, Folksonomy and Art Museums: Results of steve.museum’s research (p. 197). Retrieved from Archives & Museum Informatics website: https://web.archive.org/web/20100210192354/http://conference.archimuse.com/files/trantSteveResearchReport2008.pdf

United States Government. (n.d.). Federal Crowdsourcing and Citizen Science Toolkit. Retrieved 9 December 2018, from CitizenScience.gov website: https://www.citizenscience.gov/toolkit/

Van Merriënboer, J. J. G., Kirschner, P. A., & Kester, L. (2003). Taking the load off a learner’s mind: Instructional design for complex learning. Educational Psychologist, 38(1), 5–13.

Vander Wal, T. (2007, February 2). Folksonomy. Retrieved 8 December 2018, from Vanderwal.net website: http://vanderwal.net/folksonomy.html

Veldhuizen, B., & Keinan-Schoonbaert, A. (2015, February 11). MicroPasts: Crowdsourcing Cultural Heritage Research. Retrieved 8 December 2018, from Sketchfab Blog website: https://blog.sketchfab.com/micropasts-crowdsourcing-cultural-heritage-research/

Verwayen, H., Fallon, J., Schellenberg, J., & Kyrou, P. (2017). Impact Playbook for museums, libraries and archives. Europeana Foundation.

Vetter, J. (2011). Introduction: Lay Participation in the History of Scientific Observation. Science in Context, 24(02), 127–141. https://doi.org/10.1017/S0269889711000032

von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose. Communications of the ACM, 51(8), 57. https://doi.org/10.1145/1378704.1378719

Wenger, E. (2010). Communities of practice and social learning systems: The career of a concept. In Social Learning Systems and communities of practice. Springer Verlag and the Open University.

Whitenton, K. (2013, December 22). Minimize Cognitive Load to Maximize Usability. Retrieved 12 September 2014, from Nielsen Norman Group website: http://www.nngroup.com/articles/minimize-cognitive-load/

WieWasWie Project informatie. (n.d.). Retrieved 1 August 2014, from VeleHanden website: http://velehanden.nl/projecten/bekijk/details/project/wiewaswie_bvr

Willett, K. (n.d.). New paper: Galaxy Zoo and machine learning. Retrieved 31 March 2015, from Galaxy Zoo website: http://blog.galaxyzoo.org/2015/03/31/new-paper-galaxy-zoo-and-machine-learning/

Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 17(2), 89–100.

What big topics in Digital Humanities should a reading group discuss in 2021?

This is a thrown-together post to capture responses to a question I asked on twitter last week. The Digital Scholarship Reading Group I run at the British Library will spend the first meeting of 2021 collaboratively planning topics to discuss in the rest of the year, so to broaden my understanding of what might be discussed, I posted, 'A question for people interested / working in Digital Humanities – what do you think are the big topics for 2021? Or what's not, but should be a focus? … New publications or conference papers welcome!'.

And since I was asking people for suggestions, it seemed like the right time to share something we'd been thinking about for a while: 'we've decided to open our discussions to people outside the British Library / Turing Institute! We'll alternate between 11am-12pm and 3-4pm meeting times on the first Tuesday of each month'. I haven't sorted the logistics for signing up – should it be on a session by session basis, or should we just add people's email address to the generic meeting request so they get the updates? (Will they get the updates, given how defensive and awful email is for collaboration these days?)

I also posted links: 'For context, here's what we read up to early 2018 What do deep learning, community archives, Livy and the politics of artefacts have in common? and a themed summary, Readings at the intersection of digital scholarship and anti-racism.

Responses to date are below. I didn't want to faff about with embedded tweets because they're more likely to break over time, so I've just indented replies with the username at the start.

Claire Boardman @boardman_claire The environmental impact of DH? Conversational AI and collections?

Jajwalya Karajgikar @JajRK Large language models, and computational text analysis overall?

@mia_out As in models that use very large amounts of training data? And yes, we should do more on CTA, I think we could probably get broader coverage of methods, thanks for the prompt!

Jajwalya Karajgikar @JajRK Models that use deep learning for language prediction; GPT-3 I think someone mentioned on the thread already?

Thomas Padilla @thomasgpadilla Social justice and DH – though all work that frames current strife as a new thing vs. a longstanding pervasive reality should be tossed into an abyss to make way for others

@mia_out I won't ask you to name and shame bad pieces, but let me know if you have any favs that do it well!

Thomas Padilla @thomasgpadilla Ha! On the collections side @dorothyjberry has a piece or two brewing. @ess_ell_zee work here is good too I think https://journal.code4lib.org/articles/14667

@artepublico peeps like @gbaezaventura and @rayenchil and the @MellonFdn supported Latinx DH program are good places to look

Same goes for @profgabrielle and all the @CCP_org work is fantastic

Wilhelmina Randtke @randtke Long term sustained funding. Acknowledging, and even compiling a list of, projects that have had resources eliminated or been completely discontinued since March.

Jenny Fewster @Fewster Absolutely! This is a problem internationally. Dig hums projects set up with one off funding that then aren’t sustained. Unfortunately digital projects are not a “set and forget” prospect. It’s a colossal waste of time, effort, knowledge and money

Matthew Hannah @TinkeringHuman I think we need/will see more work about the limits of neoliberal capitalism, the academy, and DH, applications of critical university studies and Marxist theory. Esp as higher ed continues to implode.

@mia_out Sounds very timely! I don't suppose you have any papers or presentations in mind?

Matthew Hannah @TinkeringHuman Claire Potter’s piece in Radical Teacher is also an inspiration: https://pdfs.semanticscholar.org/c3c0/b0f853710a56b13b0d232b3b435a19bf59a7.pdf

But we need more engagement I think around the question of precarity and economics imo

Johan Oomen @johanoomen Detecting polyvocality in heritage collections and navigating this underexplored dimension to investigate shifting viewpoints over time. Could also be a great opportinity for crowdsourcing projects, to encourage contemporary users to voice their opinions on contentious topics.

@mia_out Ooh, that's a really juicy one – lots of potential and lots of pitfalls

Erik Champion @nzerik The influence of social media on politics? The failure of social media apps, webchat etc to compensate for lockdown distancing? Govt and corp control on personal data? Big companies controlling VR devices and personal +physiological data?

@mia_out As seen recently when people were annoyed they had to do a Google Recaptcha on a COVID test site

Erik Champion @nzerik Bots need vaccines too! (Equality for bots trojans and spam machines #101)

Alexander Doria @Dorialexander On the technical side, optical manuscript recognition and layout analysis (especially for newspaper archives): mature tools are just emerging and that can change a lot in terms of corpus availability, research directions and digitization choices.

@mia_out There is so much interesting work on newspapers right now! It feels like scholarship is going to have a quite different starting point in just a few years. Periodicals less so, maybe because they're more specialised and less (family history) name rich?

Alexander Doria @Dorialexander Yes that's true. Perhaps also because they are less challenging both technically and intellectually (it's not that much of a stretch to go from book studies to the periodicals).

Alexander Doria @Dorialexander (On the social side I would say there is a long overdue uncomfortable discussion about the reliance of the field to diverse forms of digital labors: from the production of digitized archives in developing countries to the large use of students as a cheap/unpaid labor force)

@mia_out That ties in with ideas from @TinkeringHuman

Max Kemman @MaxKemman I think we'll be seeing more about Computational Humanities and how it relates to Digital Humanities, for which a good starting point will be the @CompHumResearch conference proceedings http://ceur-ws.org/Vol-2723/

@mia_out Ooh, we could have a debate or discussion about the difference!

Lauren Tilton @nolauren And intersection/ difference from Data Science

@mia_out Good call, the lines are becoming increasingly blurred, hopefully in more good ways than bad

Gabriel Hankins @GabrielHankins GPT-3 and algorithmic composition. Interested in the conversation if you open it!

And finally, one reason I collected these responses was:

Michael Lascarides @mlascarides A feature I wish Twitter had: When I see someone influential in a domain I'm interested in ask a really great question, I want to bookmark that question to return to in a couple of days once the responses have come in. It's a use case a bit more specific than a "like".

Michael Lascarides @mlascarides Inspired most recently by [my] Q, but it comes up about once a week for me

Notes from Digital Humanities 2019 (DH2019 Utrecht)

My rough notes from the Digital Humanities 2019 conference in Utrecht. All the usual warnings about partial attention / tendency for distraction apply. My comments are usually in brackets.

I found the most useful reference for the conference programme to be https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&presentations=show but it doesn't show the titles or abstracts for papers within panels.

Some places me and my colleagues were during the conference: https://blogs.bl.uk/digital-scholarship/2019/07/british-library-digital-scholarship-at-digital-humanities-2019-.html http://livingwithmachines.ac.uk/living-with-machines-at-digital-humanities-2019/

DH2019 Keynote by Francis B. Nyamnjoh, 'African Inspiration for Understanding the Compositeness of Being Human through Digital Technology'

https://dh2019.adho.org/wp-content/uploads/2019/07/Nyamnjoh_Digital-Humanities-Keynote_2019.pdf

Notion of complexity, and incompleteness familiar to Africa. Africans frown on attempts to simplify

How do notions of incompleteness provide food for thought in digital humanities?

Nyamnjoh decries the sense of superiority inspired by zero sum games. 'Humans are incomplete, nature is incomplete. Religious bit. No one can escape incompleteness.' (Phew! This is something of a mantra when you work with collections at scale – working in cultural institutions comes with a daily sense that the work is so large it will continue after you're just a memory. Let's embrace rather than apologise for it)

References books by Amos Tutuola

Nyamnjoh on hidden persuaders, activators. Juju as a technology of self-extension. With juju, you can extend your presence; rise beyond ordinary ways of being. But it can also be spyware. (Timely, on the day that Zoom was found to allow access to your laptop camera – this has positives and negatives)

Nyamnjoh: DH as the compositeness of being; being incomplete is something to celebrate. Proposes a scholarship of conviviality that takes in practices from different academic disciplines to make itself better.

Nyamnjoh in response to Micki K's question about history as a zero-sum game in which people argue whether something did or didn't happen: create archives that can tell multiple stories, complexify the stories that exist

DH2019 Day 1, July 10

LP-03: Space Territory GeoHumanities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=455&presentations=show Locating Absence with Narrative Digital Maps

How to combine new media production with DH methodologies to create kit for recording and locating in the field.

Why georeference? Situate context, comparison old and new maps, feature extraction, or exploring map complexity.

Maps Re-imagined: Digital, Informational, and Perceptional Experimentations in Progress by Tyng-Ruey Chuang, Chih-Chuan Hsu, Huang-Sin Syu used OpenStreetMap with historical Taiwanese maps. Interesting base map options inc ukiyo style https://bcfuture.github.io/tileserver/Switch.html

Oceanic Exchanges: Transnational Textual Migration And Viral Culture

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=477&presentations=show Oceanic Exchanges studies the flow of information, searching for historical-literary connections between newspapers around the world; seeks to push the boundaries of research w newspapers

Challenges: imperfect comparability of corpora – data is provided in different ways by each data provider; no unifying ontology between archives (no generic identification of specific items); legal restrictions; TEI and other work hasn't been suitable for newspaper research
Limited ability to conduct research across repositories. Deep semantic multilingual text mining remains a challenge. Political (national) and practical organisation of archives currently determines questions that can be asked, privileges certain kinds of enquiry.
Oceanic Exchanges project includes over 100 million pages. Corpus exploration tool needed to support: exploring data (metadata and text); other things that went by too quickly.

The Past, Present and Future of Digital Scholarship with Newspaper Collections

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=483&presentations=show

I was on this panel so I tweeted a bit but have no notes myself.

My abstract: https://www.openobjects.org.uk/2019/07/the-past-present-and-future-of-digital-scholarship-with-newspaper-collections/
My slides: https://www.slideshare.net/miaridge/living-with-machines-at-the-past-present-and-future-of-digital-scholarship-with-newspaper-collections-154700888
See also: http://livingwithmachines.ac.uk/living-with-machines-at-digital-humanities-2019/
@RossiAtanassova Laurel Brake: A researcher's wish list for digitised newspaper journals ⁦⁩pic.twitter.com/rNmuuBOFb8
@giovanni1085 ⁦@printjournalism⁩ list of existing (and very much felt) problems/challenges for digital media history. But, there is hope and we persevere #DH2019 pic.twitter.com/LSilbMi9vg
@juliannenyhan Crucial points by @Ajprescott about necessity of developing critical frameworks for scholarship with digital newspapers that assist in helping us understand how & why digital newspaper collections take form they do & how e.g. power, bias & absence act on and through them #dh2019 pic.twitter.com/WSfjC2aq2t

Working with historical text (digitised newspapers, books, whatever) collections at scale has some interesting challenges and rewards. Inspired by all the newspaper sessions? Join an emerging community of practitioners, researchers and critical friends via this document from a 'DH2019 Lunch session – Researchers & Libraries working together on improving digitised newspapers' https://docs.google.com/document/d/1JJJOjasuos4yJULpquXt8pzpktwlYpOKrRBrCds8r2g/edit

Zotero group for Historical Periodicals https://www.zotero.org/groups/704613/historical_periodicals
Discussion list https://groups.google.com/forum/#!forum/digital-historical-periodica

Complexities, Explainability and Method

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=486&presentations=show I enjoyed listening to this panel which is so far removed from my everyday DH practice.

Other stuff

Tweet: If you ask a library professional about digitisating (new word alert!) a specific collection and they appear to go quiet, this is actually what they're doing – digitisation takes shedloads of time and paperwork https://twitter.com/CamDigLib/status/1148888628405395456

Posters

'Why engage with digital source criticism' seems particularly relevant to our 'fake news' era https://ranke2.uni.lu pic.twitter.com/JxV83RFTY6
Good 'DH for fun' poster https://correspsearch.net/quotesalute/ 'inspiring greetings for your correspondence'
I missed the special poster session so excited to see PDFs and links at 'Digital Humanities – the perspective of Africa'https://dhafrica.blog/outcomes/

@LibsDH ADHO Lib & DH SIG meetup

There was a lunchtime meeting for 'Libraries and Digital Humanities: an ADHO Special Interest Group', which was a lovely chance to talk libraries / GLAMs and DH. You can join the group via https://docs.google.com/forms/d/e/1FAIpQLSfswiaEnmS_mBTfL3Bc8fJsY5zxhY7xw0auYMCGY_2R0MT06w/viewform or the mailing list at http://lists.digitalhumanities.org/mailman/listinfo/libdh-sig

DH2019 Day 2, July 11

XR in DH: Extended Reality in the Digital Humanities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=523&presentations=show

Another panel where I enjoyed listening and learning about a field I haven't explored in depth. Tweet from the Q&A: 'Love the 'XR in DH: Extended Reality in the Digital Humanities' panel responses to a question about training students only for them to go off and get jobs in industry: good! Industry needs diversity, PhDs need to support multiple career paths beyond academia'

Data Science & Digital Humanities: new collaborations, new opportunities and new complexities

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=532&presentations=show Beatrice Alex, Anne Alexander, David Beavan, Eirini Goudarouli, Leonardo Impett, Barbara McGillivray, Nora McGregor, Mia Ridge

My work with open cultural data has led to me asking 'how can GLAMs and data scientists collaborate to produce outcomes that are useful for both?'. Following this, I presented a short paper, more info at https://www.openobjects.org.uk/2019/07/in-search-of-the-sweet-spot-infrastructure-at-the-intersection-of-cultural-heritage-and-data-science/ https://www.slideshare.net/miaridge/in-search-of-the-sweet-spot-infrastructure-at-the-intersection-of-cultural-heritage-and-data-science.

As summarised in tweets:

https://twitter.com/semames1/status/1149250799232540672, 'data science can provide new routes into library collections; libraries can provide new challenging sources of information (scale, untidy data) for data scientists';
https://twitter.com/sp_meta/status/1149251010025656321 'library staff are often assessed by strict metrics of performance – items catalog, speed of delivery to reading room – that isn’t well-matched to messy, experimental collaborations with data scientists';
https://twitter.com/melissaterras/status/1149251480576303109 'Copyright issues are inescapable… they are the background noise to what we do';
https://twitter.com/sp_meta/status/1149251656720289792 'How can library infrastructure change to enable collaboration with data scientists, encouraging use of collections as data and prompting researchers to share their data and interpretations back?';
(me) 'I'm wondering about this dichotomy between 'new' or novel, and 'useful' or applied – is there actually a sweet spot where data scientists can work with DH / GLAMs or should we just apply data science methods and also offer collections for novel data science research? Thinking of it as a scale of different aspects of 'new to applied research' rather than a simple either/or'.

SP-19: Cultural Heritage, Art/ifacts and Institutions

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=462&presentations=show

“Un Manuscrit Naturellement ” Rescuing a library buried in digital sand

1979, agreement with Ministry of Culture and IRHT to digitise all manuscripts stored in French public libraries. (Began with microfilm, not digital). Safe, but not usable. Financial cost of preserving 40TB of data was prohibitive, but BnF started converting TIFFs to JP2 which made storage financially feasible. Huge investment by France in data preservation for digitised manuscripts.
Big data cleaning and deduplication process, got rid of 1 million files. Discovered errors in TIFF when converting to JP2. Found inconsistencies with metadata between databases and files. 3 years to do the prep work and clean the data!
‘A project which lasts for 40 years produces a lot of variabilities’. Needed a team, access to proper infrastructure; the person with memory of the project was key.

A Database of Islamic Scientific Manuscripts — Challenges of Past and Future

(Following on from the last paper, digital preservation takes continuous effort). Moving to RDF model based on CIDOC-CRM, standard triple store database, standard ResearchSpace/Metaphactory front end. Trying to separate the data from the software to make maintenance easier.

Analytical Edition Detection In Bibliographic Metadata; The Emerging Paradigm of Bibliographic Data Science

Tweet: Two solid papers on a database for Islamic Scientific Manuscripts and data science work with the ESTC (English Short Title Catalogue) plus reflections on the need for continuous investment in digital preservation. Back on familiar curatorial / #MuseTech ground!
Lahti – Reconciling / data harmonisation for early modern books is so complex that there are different researchers working on editions, authors, publishers, places

Syriac Persons, Events, and Relations: A Linked Open Factoid-based Prosopography

Prosopography and factoids. His project relies heavily on authority files that http://syriaca.org/ produces. Modelling factoids in TEI; usually it’s done in relational databases.
Prosopography used to be published as snippets of narrative text about people that enough information was available about
Factoid – a discrete piece of prosopographical information asserted in a primary source text and sourced to that text.
Person, event and relation factoids. Researcher attribution at the factoid level. Using TEI because (as markup around the text) it stays close to the primary source material; can link out to controlled vocabulary
Srophe app – an open source platform for cultural heritage data used to present their prosopographical data https://srophe.app/
Harold Short says how pleased he is to hear a project like that taking the approach they have; TEI wasn’t available as an option when they did the original work (seriously beautiful moment)
Why SNAP? ‘FOAF isn’t really good at describing relationships that have come about as a result of slave ownership’
More on factoid prosopography via Arianna Ciula https://factoid-dighum.kcl.ac.uk/

Day 3, July 12

Complexities in the Use, Analysis, and Representation of Historical Digital Periodicals

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=527&presentations=show

Torsten Roeder: Tracing debate about a particular work through German music magazines and daily newspapers. OCR and mass digitisation made it easier to compose representative text corpora about specific subjects. Authorship information isn’t available so don’t know their backgrounds etc, means a different form of analysis. ‘Horizontal reading’ as a metaphor for his approach. Topic modelling didn’t work for looking for music criticism.
Roeder's requirements: accessible digital copies of newspapers; reliable metadata; high quality OCR or transcriptions; article borders; some kind of segmentation; deep semantic annotation – ‘but who does what?’ What should collection holders / access providers do, and what should researchers do? (e.g. who should identify entities and concepts within texts? This question was picked up in other discussion in the session, on twitter and at an impromptu lunchtime meetup)
Zeg Segal. The Periodical as a Geographical Space. Relation between the two isn’t unidirectional. Imagined space constructed by the text and its layout. Periodicals construct an imaginary space that refers back to the real. Headlines, para text, regular text. Divisions between articles. His case study for exploring the issues: HaZefirah. (sample slide image https://twitter.com/mia_out/status/1149581497680052224)
Nanette Rißler-Pipka, Historical Periodicals Research, Opportunities and Limitations. The limitations she encounters as a researcher. Building a corpus of historical periodicals for a research question often means using sources from more than one provider of digitised texts. Different searches, rights, structure. (The need for multiple forms of interoperability, again)
Wants article / ad / genre classifications. For metadata wants, bibliographical data about the title (issue, date); extractable data (dates, names, tables of contents), provenance data (who digitised, when?). When you download individual articles, you lose the metadata which would be so useful for research. Open access is vital; interoperability is important; the ability to create individual collections across individual libraries is a wonderful dream
Estelle Bunout. Impresso providing exploration tools (integrate and decomplexify NLP tools in current historical research workflows). https://impresso-project.ch/app/#/
Working on: expanding a query – find neighbouring terms and frequent OCR errors. Overview of query: where and when is it? Whole corpus has been processed with topic modelling.
Complex queries: help me find the mention of places, countries, person in a particular thematic context. Can save to collection or export for further processing.
See the unsearchable: missing issues, failure to digitise issues, failure to OCRise, corrupt files
Transparency helps researchers discover novel opportunities and make informed decisions about sources.
Clifford Wulfman – how to support transcriptions, linked open data that allows exploration of notions of periodicity, notions of the periodical. My tweet: Clifford Wulfman acknowledging that libraries don't have the resources to support special 'snowflake' projects because they're working to meet the most common needs. IME this question/need doesn't go away so how best to tackle and support it?
Q&A comment: what if we just put all newspapers on Impresso? Discussion of standardisation, working jointly, collaborating internationally
Melodee Beals comments: libraries aren’t there just to support academic researchers, academics could look to supporting the work of creative industries, journalists and others to make it easier for libraries to support them.
Subject librarian from Leiden University points out that copyright limits their ability to share newspapers after 1880. (Innovating is hard when you can't even share the data)
Nanette Rißler says researchers don't need fancy interfaces, just access to the data (which probably contradicts the need for 'special snowflake' systems and explains why libraries can never ever make all users happy)

LP-34: Cultural Heritage, Art/ifacts and Institutions

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=516&presentations=show

(I was chairing so notes are sketchier)

Mark Hill, early modern (1500-1800 but 18thC in particular) definitions of ‘authorship’. How does authorship interact with structural aspects of publishing? Shift of authorship from gentlemanly to professional occupation.
Using the ESTC. Has about 1m actors, 400k documents with actors attached to them. Actors include authors, editors, publishers, printers, translators, dedicatees. Early modern print trade was ‘trade on a human scale’. People knew each other ‘hand-operated printing press required individual actors and relationships’.
As time goes on, printers work with fewer, publishers work with more people, authors work with about the same number of people.
They manually created a network of people associated with Bernard Mandeville and compared it with a network automatically generated from ESTC.
Looking at a work network for Edmond Hoyle’s Short Treatise on the Game of Whist. (Today I learned that Hoyle's Rules, determiner of victory in family card games and of 'according to Hoyle' fame, dates back to a book on whist in the 18thC)
(Really nice use of social network analysis to highlight changes in publisher and authorship networks.) Eigenvector very good at finding important actors. In the English Civil War, who you know does matter when it comes to publishing. By 18thC publishers really matter. See http://ceur-ws.org/Vol-2364/19_paper.pdf for more.

Richard Freedman, David Fiala, Andrew Janco et al

What is a musical quotation? Borrowing, allusion, parody, commonplace, contrafact, cover, plagiat, sampling, signifying.
Tweet: Freedman et al.'s slides for 'Citations: The Renaissance Imitation Mass (CRIM) and The Quotable Musical Text in a Digital Age' https://bit.ly/CRIM_Utrecht are a rich introduction to applications of #DigitalMusicology encoding and markup
I spend so much time in text worlds that it's really refreshing to hear from musicologists who play music to explain their work and place so much value on listening while also exploiting digital processing tools to the max

Digging Into Pattern Usage Within Jazz Improvisation (Pattern History Explorer, Pattern Search and Similarity Search) Frank Höger, Klaus Frieler, Martin Pfleiderer

'Dig that lick' jazz similarity search engine https://dig-that-lick.hfm-weimar.de/pattern_search/

Impromptu meetup to discuss issues raised around digitised newspapers research and infrastructure

See notes about DH2019 Lunch session – Researchers & Libraries working together on improving digitised newspapers. 20 or more people joined us for a discussion of the wonderful challenges and wish lists from speakers, thinking about how we can collaborate to improve the provision of digitised newspapers / periodicals for researchers.

https://twitter.com/saschel/status/1149640870628483072
Inspired by conversations about digitised newspapers at #DH2019? Think about the points / rants / lessons you’d share in manifestos by/for researchers and GLAMs

Theorising the Spatial Humanities panel

https://www.conftool.pro/dh2019/index.php?page=browseSessions&path=adminSessions&print=export&ismobile=false&form_session=539&presentations=show

?? Space as a container for understanding, organising information. Chorography, the writing of the region.
Tweet: In the spatial humanities panel where a speaker mentions chorography, which along with prosopography is my favourite digital-history-enabled-but-also-old concept
Daniel Alves. Do history and literature researchers feel the need to incorporate spatial analysis in their work? A large number who do don’t use GIS. Most of them don’t believe in it (!). The rest are so tired that they prefer theorising (!!) His goal, ref last night keynote, is not to build models, tools, the next great algorithm; it’s to advance knowledge in his specific field.
Tweet: @DanielAlvesFCSH Is #SpatialDH revolutionary? Do history and literature researchers feel the need to incorporate spatial analysis in their work? A large number who do don’t use GIS. Most of them don’t believe in it(!). The rest are so tired that they prefer theorising(!!)
Tweet: @DanielAlvesFCSH close reading is still essential to take in the inner subjectivity of historical / literary sources with a partial and biases conception of space and place
Tien Danniau, Ghent Centre for Digital Humanities – deep maps. How is the concept working for them?
Tweet: Deep maps! A slide showing some of the findings from the 2012 NEH Advanced Institute on spatial narratives and deep mapping, which is where I met many awesome DH and spatial history people #DH2019pic.twitter.com/JiQepz7kH5
Katie McDonough, Spatial history between maps and texts: lessons from the 18thC. Refers to Richard White’s spatial history essay in her abstract. Rethinking geographic information extraction. Embedded entities, spatial relations, other stuff.
Tweet: @khetiwe24 references work discussed in https://www.tandfonline.com/doi/abs/10.1080/13658816.2019.1620235?journalCode=tgis20 … noting how the process of annotating texts requires close reading that changes your understanding of place in the text (echoing @DanielAlvesFCSH 's earlier point)
Tweet: Final #spatialDH talk 'towards spatial linguistics' #DH2019 https://twitter.com/mia_out/status/1149666605258829824
Tweet #DH2019 Preserving deep maps? I'd talk to folk in web archiving for a sense of which issues re recording complex, multi-format, dynamic items are tricky and which are more solveable

Closing keynote: Digital Humanities — Complexities of Sustainability, Johanna Drucker

(By this point my laptop and mental batteries were drained so I just listened and tweeted. I was also taking part in a conversation about the environmental sustainability of travel for conferences, issues with access to visas and funding, etc, that might be alleviated by better incorporating talks from remote presenters, or even having everyone present online.)

Finally, the DH2020 conference is calling for reviewers. Reviewing is an excellent way to give something back to the DH community while learning about the latest work as it appears in proposals, and perhaps more importantly, learning how to write a good proposal yourself. Find out more: http://dh2020.adho.org/cfps/reviewers/

'In search of the sweet spot: infrastructure at the intersection of cultural heritage and data science'

It's not easy to find the abstracts for presentations within panels on the Digital Humanities 2019 (DH2019) site, so I've shared mine here.

In search of the sweet spot: infrastructure at the intersection of cultural heritage and data science

Mia Ridge, British Library

My slides: https://www.slideshare.net/miaridge/in-search-of-the-sweet-spot-infrastructure-at-the-intersection-of-cultural-heritage-and-data-science

In search of the sweet spot: infrastructure at the intersection of cultural heritage and data science? from Mia

This paper explores some of the challenges and paradoxes in the application of data science methods to cultural heritage collections. It is drawn from long experience in the cultural heritage sector, predating but broadly aligned to the 'OpenGLAM' and 'Collections as Data' movements. Experiences that have shaped this thinking include providing open cultural data for computational use; creating APIs for catalogue and interpretive records, running hackathons, and helping cultural organisations think through the preparation of 'collections as data'; and supervising undergraduate and MSc projects for students of computer science.

The opportunities are many. Cultural heritage institutions (aka GLAMS – galleries, libraries, archives and museums) hold diverse historical, scientific and creative works – images, printed and manuscript works, objects, audio or video – that could be turned into some form of digital 'data' for use in data science and digital humanities research. GLAM staff have expert knowledge about the collections and their value to researchers. Data scientists bring a rigour, specialist expertise and skills, and a fresh perspective to the study of cultural heritage collections.

While the quest to publish cultural heritage records and digital surrogates for use in data science is relatively new, the barriers within cultural organisations to creating suitable infrastructure with others are historically numerous. They include different expectations about the pace and urgency of work, different levels of technical expertise, resourcing and infrastructure, and different goals. They may even include different expectations about what 'data' is – metadata drawn from GLAM catalogues is the most readily available and shared data, but not only is this rarely complete, often untidy and inconsistent (being the work of decades or centuries and many hands over that time), it is also a far cry from datasets rich with images or transcribed text that data scientists might expect.

Copyright, data protection and commercial licensing can limit access to digitised materials (though this varies greatly). 'Orphaned works', where the rights holder cannot be traced in order to licence the use of in-copyright works, mean that up to 40% of some collections, particularly sound or video collections, are unavailable for risk-free use.(2012)

While GLAMs have experimented with APIs, downloadable datasets and SPARQL endpoints, they rarely have the resources or institutional will to maintain and refresh these indefinitely. Records may be available through multi-national aggregators such as Europeana, DPLA, or national aggregators, but as aggregation often requires that metadata is mapped to the lowest common denominator, their value for research may be limited.

The area of overlap between 'computationally interesting problems' and 'solutions useful for GLAMs' may be smaller than expected to date, but collaboration between cultural institutions and data scientists on shared projects in the 'sweet spot' – where new data science methods are explored to enhance the discoverability of collections – may provide a way forward. Sector-wide collaborations like the International Image Interoperability Framework (IIIF, https://iiif.io/) provide modern models for lightweight but powerful standards. Pilot projects with students or others can help test the usability of collection data and infrastructure while exploring the applicability of emerging technologies and methods. It is early days for these collaborations, but the future is bright.

Panel overview

An excerpt from the longer panel description by David Beavan and Barbara McGillivray.

This panel highlights the emerging collaborations and opportunities between the fields of Digital Humanities (DH), Data Science (DS) and Artificial Intelligence (AI). It charts the enthusiastic progress of the Alan Turing Institute, the UK national institute for data science and artificial intelligence, as it engages with cultural heritage institutions and academics from arts, humanities and social sciences disciplines. We discuss the exciting work and learnings from various new activities, across a number of high-profile institutions. As these initiatives push the intellectual and computational boundaries, the panel considers both the gains, benefits, and complexities encountered. The panel latterly turns towards the future of such interdisciplinary working, considering how DS & DH collaborations can grow, with a view towards a manifesto. As Data Science grows globally, this panel session will stimulate new discussion and direction, to help ensure the fields grow together and arts & humanities remain a strong focus of DS & AI. Also so DH methods and practices continue to benefit from new developments in DS which will enable future research avenues and questions.

'The Past, Present and Future of Digital Scholarship with Newspaper Collections'

It's not easy to find the abstracts for presentations within panels on the Digital Humanities 2019 (DH2019) site, so I've shared mine here. The panel was designed to bring together range of interdisciplinary newspaper-based digital humanities and/or data science projects, with 'provocations' from two senior scholars who will provide context for current ambitions, and to start conversations among practitioners.

Short Paper: Living with Machines

Paper authors: Mia Ridge, Giovanni Colavizzawith Ruth Ahnert, Claire Austin, David Beavan, Kaspar Beelens, Mariona Coll Ardanuy, Adam Farquhar, Emma Griffin, James Hetherington, Jon Lawrence, Katie McDonough, Barbara McGillivray, André Piza, Daniel van Strien, Giorgia Tolfo, Alan Wilson, Daniel Wilson.

My slides: https://www.slideshare.net/miaridge/living-with-machines-at-the-past-present-and-future-of-digital-scholarship-with-newspaper-collections-154700888

Living with Machines at The Past, Present and Future of Digital Scholarship with Newspaper Collections from Mia

Living with Machines is a five-year interdisciplinary research project, whose ambition is to blend data science with historical enquiry to study the human impact of the industrial revolution. Set to be one of the biggest and most ambitious digital humanities research initiatives ever to launch in the UK, Living with Machines is developing a large-scale infrastructure to perform data analyses on a variety of historical sources, and in so doing provide vital insights into the debates and discussions taking place in response to today’s digital industrial revolution.

Seeking to make the most of a self-described 'radical collaboration', the project will iteratively develop research questions as computational linguists, historians, library curators and data scientists work on a shared corpus of digitised newspapers, books and biographical data (census, birth, death, marriage, etc. records). For example, in the process of answering historical research questions, the project could take advantage of access to expertise in computational linguistics to overcome issues with choosing unambiguous and temporally stable keywords for analysis, previously reported by others (Lansdall-Welfare et al., 2017). A key methodological objective of the project is to 'translate' history research questions into data models, in order to inspect and integrate them into historical narratives. In order to enable this process, a digital infrastructure is being collaboratively designed and developed, whose purpose is to marshal and interlink a variety of historical datasets, including newspapers, and allow for historians and data scientists to engage with them.

In this paper we will present our vision for Living with Machines, focusing on how we plan to approach it, and the ways in which digital infrastructure enables this multidisciplinary exchange. We will also showcase preliminary results from the different research 'laboratories', and detail the historical sources we plan to use within the project.

The Past, Present and Future of Digital Scholarship with Newspaper Collections

Mia Ridge (British Library), Giovanni Colavizza (Alan Turing Institute)

Historical newspapers are of interest to many humanities scholars, valued as sources of information and language closely tied to a particular time, social context and place. Following library and commercial microfilming and, more recently, digitisation projects, newspapers have been an accessible and valued source for researchers. The ability to use keyword searches through more data than ever before via digitised newspapers has transformed the work of researchers.[1]

Digitised historic newspapers are also of interest to many researchers who seek large bodies of relatively easily computationally-transcribed text on which they can try new methods and tools. Intensive digitisation over the past two decades has seen smaller-scale or repository-focused projects flourish in the Anglophone and European world (Holley, 2009; King, 2005; Neudecker et al., 2014). However, just as earlier scholarship was potentially over-reliant on The Times of London and other metropolitan dailies, this has been replicated and reinforced by digitisation projects (for a Canadian example, see Milligan 2013).

In the last years, several large consortia projects proposing to apply data science and computational methods to historical newspapers at scale have emerged, including NewsEye, impresso, Oceanic Exchanges and Living with Machines. This panel has been convened by some consortia members to cast a critical view on past and ongoing digital scholarship with newspapers collections, and to inform its future.

Digitisation can involve both complexities and simplifications. Knowledge about the imperfections of digitisation, cataloguing, corpus construction, text transcription and mining is rarely shared outside cultural institutions or projects. How can these imperfections and absences be made visible to users of digital repositories? Furthermore, how does the over-representation of some aspects of society through the successive winnowing and remediation of potential sources – from creation to collection, microfilming, preservation, licensing and digitisation – affect scholarship based on digitised newspapers. How can computational methods address some of these issues?

The panel proposes the following format: short papers will be delivered by existing projects working on large collections of historical newspapers, presenting their vision and results to date. Each project is at different stages of development and will discuss their choice to work with newspapers, and reflect on what have they learnt to date on practical, methodological and user-focused aspects of this digital humanities work. The panel is additionally an opportunity to consider important questions of interoperability and legacy beyond the life of the project. Two further papers will follow, given by scholars with significant experience using these collections for research, in order to provide the panel with critical reflections. The floor will then open for debate and discussion.

This panel is a unique opportunity to bring senior scholars with a long perspective on the uses of newspapers in scholarship together with projects at formative stages. More broadly, convening this panel is an opportunity for the DH2019 community to ask their own questions of newspaper-based projects, and for researchers to map methodological similarities between projects. Our hope is that this panel will foster a community of practice around the topic and encourage discussions of the methodological and pedagogical implications of digital scholarship with newspapers.

[1] For an overview of the impact of keyword search on historical research see (Putnam, 2016) (Bingham, 2010).