57 Varieties of Digital History? Towards the future of looking at the past

Back in November 2015, Tara Andrews invited me to give a guest lecture on 'digital history' for the Introduction to Digital Humanities course at the University of Bern, where she was then a professor. This is a slightly shortened version of my talk notes, finally posted in 2024 as I go back to thinking about what 'digital history' actually is.

Illustration of a tin of Heinz Cream of Tomato Soup '57 varieties'I called my talk '57 varieties of digital history' as a play on the number of activities and outputs called 'digital history'. While digital history and digital humanities are often linked and have many methods in common, digital history also draws on the use of computers for quantitative work, and digitisation projects undertaken in museums, libraries, archives and academia. Digital tools have enhanced many of the tasks in the research process (which itself has many stages – I find the University of Minnesota Libraries' model with stages of 'discovering', 'gathering', 'creating' and 'sharing' useful), but at the moment the underlying processes often remain the same.

So, what is digital history?

…using computers for writing, publishing

A historian on twitter once told me about a colleague who said they're doing digital history because they're using PowerPoint. On reflection, I think they have a point. These simple tools might be linked to fairly traditional scholarship – writing journal articles or creating presentations – but text created in them is infinitely quotable, shareable, and searchable, unlike the more inert paper equivalents. Many scholars use Word documents to keep bits of text they've transcribed from historical source materials, or to keep track of information from other articles or books. These become part of their personal research collections, which can build up over years into substantial resources in their own right. Even 'helper' applications like reference managers such as Zotero or EndNote can free up significant amounts of time that can then be devoted to research.

…the study of computers

When some people hear 'digital history', they imagine that it's the study of computers, rather than the use of digital methods by historians. While this isn't a serious definition of digital history, it's a reminder that viewing digital tools through a history of science and technology lens can be fruitful.

…using digitised material

Digitisation takes many forms, including creating or transcribing catalogue records about heritage collections, writing full descriptions of items, and making digital images of books, manuscripts, artworks etc. Metadata – information about the item, such as when and where it was made – is the minimum required to make collections discoverable. Increasingly, new forms of photography may be applied to particular types of objects to capture more information than the naked eye can see. Text may be transcribed, place names mapped, marginalia annotated and more.

The availability of free (or comparatively inexpensive) historical records through heritage institutions and related commercial or grassroots projects means we can access historical material without having to work around physical locations and opening hours, negotiate entry to archives (some of which require users to be 'bona fide scholars'), or navigate unknown etiquettes. Text transcription allows readers who lack the skills to read manuscript or hand-written documents to make use of these resources, as well as making the text searchable.

For some historians, this is about as digital as they want to get. They're very happy with being able to access more material more conveniently; their research methods and questions are still pretty unchanged.

…creating digital repositories

Most digitised items live in some broader system that aggregates and presents material from a particular institution, or related to a particular topic. While some digital repositories are based on sub-sets of official institutional collections, most aren't traditional 'archives'. One archivist describes digital repositories as a 'purposeful collection of surrogates'.

Repositories aren't always created by big, funded projects. Personal research collections assembled over time are one form of ad hoc repository – they may contain material from many different archives collected by one researcher over a number of years.

Themed collections may be the result of large, scholarly projects with formal partners who've agreed to contribute material about a particular time, place, group in society or topic. They might also be the result of work by a local history society with volunteers who digitise material and share it online.

'Commons' projects (like Flickr or Wikimedia Commons) tend to be less focused – they might contain collections from specific institutions, but these specific collections are aggregated into the whole repository, where their identity (and the provenance of individual items) may be subsumed. While 'commons' platforms technically enable sharing, the cultural practices around sharing are yet to change, particularly for academic historians and many cultural institutions.

Repositories can provide different functionality. In some 'scholarly workbenches' you can collect and annotate material; in others you can bookmark records or download images. They allow support different levels of access. Some allow you to download and re-use material without restriction, some only allow non-commercial use, and some are behind paywalls.

…creating datasets

The Old Bailey Online project has digitised the proceedings of the Old Bailey, making court cases from 1674 to 1913 available online. They haven't just transcribed text from digital images, they've added structure to the text. For example, the defendant's name, the crime he was accused of and the victim's name have all been tagged. The addition of this structure means that the material can be studied as text, or analysed statistically.

Adding structure to data can enable innovative research activities. If the markup is well-designed, it can support the exploration of questions that were not envisaged when the data was created. Adding structure to other datasets may become less resource-intensive as new computational techniques become available.

…creating visualisations and innovative interfaces

Some people or projects create specialist interfaces to help people explore their datasets. They might be maps or timelines that help people understand the scope of a collection in time and place, while others are more interpretive, presenting a scholarly argument through their arrangement of interface elements, the material they have assembled, the labels they use and the search or browse queries they support. Ideally, these interfaces should provide access to the original records underlying the visualisation so that scholars can investigate potential new research questions that arise from their use of the interface.

…creating linked data (going from strings to things)

As well as marking up records with information like 'this bit is a defendant's name', we can also link a particular person's name to other records about them online. One way to do this is to link their name to published lists of names online. These stable identifiers mean that we could link any mention of a particular person in a text to this online identifier, so that 'Captain Cook' or 'James Cook' are understood to be different strings about the same person.

A screenshot of structured data on the dbpedia site e.g. dbo:birthPlace = 1728-01-01
dbpedia page for 'James Cook', 2015

This also helps create a layer of semantic meaning about these strings of text. Software can learn that strings that represent people can have relationships with other things – in this case, historical voyages, other people, natural history and ethnographic collections, and historical events.

…applying computational methods, tools to digitised sources

So far some of what we've seen has been heavily reliant on manual processing – someone has had to sit at a desk and decide which bit of text is about the defendant and which about the victim in an Old Bailey case.

So people are developing software algorithms to find concepts – people, places, events, etc – within text. This is partly a response to amount of digitised text now available; partly a response to recognition of power of structured data. Techniques like 'named entity recognition' help create structure from unstructured data. This allows data to be queried, contextualised and presented in more powerful ways.

The named entity recognition software here [screenshot lost?] knows some things about the world – the names of places, people, dates, some organisations. It also gets lots of things wrong – it doesn't understand 'category five storm' as a concept, it mixes up people and organisations – but as a first pass, it has potential. Software can be trained to understand the kinds of concepts and things that occur in particular datasets. This also presents a problem for historians, who may have to use software trained for modern, commercial data.

This is part of a wider exploration of 'distant reading', methods for understanding what's in a corpus by processing the text en masse rather than by reading each individual novel or document. For example, it might be used to find linguistic differences between genres of literature, or between authors from different countries.

In this example [screenshot of topic modelling lost?], statistically unlikely combinations of words have been grouped together into 'topics'. This provides a form of summary of the contents of text files.

Image tagging – 'machine learning' techniques mean that software can learn how to do things rather than having to be precisely programmed in advance. This will have more impact on the future of digital history as these techniques become mainstream.

Audio tagging – software suggests tags, humans verify them. Quicker than doing them from scratch, but possible for software to miss significant moments that a person would spot. (e.g. famous voices, cultural references, etc).

Handwritten text recognition will transform manuscript sources such as much as optical character recognition has transformed typed sources!

Studying born digital material (web archives, social media corpus etc)

Important historical moments, such as the 'Arab spring', happened on social media platforms like twitter, youtube and facebook. The British Library and the Internet Archive have various 'snapshots' of websites, but they can only hope to capture a part of online material. We've already lost significant chunks of web history – every time a social media platform is shut without being archived, future historians have lost valuable data. (Not to mention people's personal data losses).

This also raises questions about how we should study 'digital material culture'. Websites like Facebook really only make sense when they're used in a social context. The interaction design of 'likes' and comments, the way a newsfeed is constructed in seconds based on a tiny part of everything done in your network – these are hard to study as a series of static screenshots or data dumps.

…sharing history online

Sharing research outputs is great. It some point it starts to intersect with public history. But questions remain about 'broadcast' vs 'discursive' modes of public history – could we do more than model old formats online? Websites and social media can be just as one-way broadcast as television unless they're designed for two-way participation.

What's missing?

Are there other research objects or questions that should be included under the heading 'digital history'? [A question to allow for discussion time]

Towards the future of looking at the past

To sum up what we've seen so far – we've seen the transformation of unorganised, unprocessed data into 'information' through research activities like 'classification, rearranging/sorting, aggregating, performing calculations, and selection'.

Historical material is being transformed from a 'page' to a 'dataset'. As some of this process is automated, it raises new questions – how do we balance the convenience of automatic processing with the responsibility to review and verify the results? How do we convey the processes that went into creating a dataset so that another researcher can understand its gaps, the mixture of algorithmic and expert processes applied to it? My work at the British Library has made the importance of versioning a dataset or corpus clear – if a historian bases an argument on one version of OCR text, and the next version is better, they should be able to link to the version they based their work on.

We've thought about how digital text and media allows for new forms of analysis, using methods such as data visualisation, topic modelling or data mining. These methods can yield new insights and provoke new research questions, but most are not yet accessible to the ordinary historian. While automated processes help, preparing data for digital history is still incredibly detailed, time-consuming work.

What are the pros and cons of the forms of digital history discussed?

Cons

The ability to locate records on consumer-facing services like Google Maps is valuable, but commercial, general use mapping tools are not always suitable for historical data, which is often fuzzy, messy, and of highly variable coverage and precision. For example, placing text or points on maps can suggest a degree of certainty not supported by the data. Locating historical addresses can be inherently uncertain in instances where street numbers were not yet in use, but most systems expect a location to be placed as a precise dot (point of interest) on a map; drawing a line to mark a location would at least allow the length of a street to be marked as a possible address.

There is an unmet need for everyday geospatial tools suitable for historians. For example, those with datasets containing historical locations would appreciate the ability to map addresses from specific periods on historical maps that are georeferenced, georectified and displayable on a modern, copyright-free map or the historical map. Similarly, biographical software, particularly when used for family history, collaborative prosopographical or community history projects would benefit from the ability to record the degree of certainty for potential-but-not-yet-proven relationships or identifications, and to link uncertain information to specific individuals.

The complexity of some software packages (or the combination of packages assembled to meet various needs) is a barrier for those short on time, unable to access dedicated support or training, or who do not feel capable of learning the specialist jargon and skills required to assess and procure software to meet their needs. The need for equipment and software licences can be a financial barrier; unclear licensing requirements and costs for purchasing high-resolution historical maps are another. Copyright and licensing are also complex issues.

Sensible historians worry about the sustainability of digital sites – their personal research collection might be around for 30 years or more; and they want to cite material that will be findable later.

There are issues with representing historical data, particularly in modern tools that cannot represent uncertainty, contingency. Here [screenshot lost?]the curator's necessarily fuzzy label of 'early 17th century' has been assigned to a falsely precise date. Many digital tools are not (yet) suitable for historical data. Their abilities have over-stated or their limits not clearly communicated/understood.

Very few peer-reviewed journals are able to host formats other than articles, inhibiting historians' ability to explore emerging digital formats for presenting research.

Faculty historians might dream of creating digital projects tailored for the specific requirements of their historical dataset, research question and audience, but their peers may not be confident in their ability to evaluate the results and assign credit appropriately.

Pros

Material can be recontextualised, transcluded, linked, contextualised. The distance between a reference and the original item reduced to just a link (unless a paywall etc gets in the way). Material can be organised in multiple ways independent of their physical location. Digital tools can represent multiple commentaries or assertions on a single image or document through linked annotations.

Computational techniques for processing data could reduce the gap between well-funded projects and others, thereby reducing the likelihood of digital history projects reinscribing the canon.

Digitised resources have made it easier to write histories of ordinary lives. You can search through multiple databases to quickly collate biographical info (births, deaths, marriages etc) and other instances when their existence might be documented. This isn't just a change in speed, but also in the accessibility of resources without travel, expense.

Screenshot of a IIIF viewer showing search results highlighted on a digitised historical text
Wellcome's IIIF viewer showing a highlighted search result

Search – any word in a digitised text can be a search result – we're not limited to keywords in a catalogue record. We can also discover some historical material via general search engines. Phonetic and fuzzy searches have also improved the ability to discover sources.

Historians like Professor Katrina Navickas have shown new models for the division of labour between people and software; previously most historical data collection and processing was painstakingly done by historians. She and others have shown how digital techniques can be applied to digitised sources in the pursuit of a historical research question.

Conclusion and questions: digital history, digital historiography?

The future is here, it's just not evenly distributed (this is the downer bit)

Academic historians might find it difficult to explore new forms of digital creation if they are hindered by the difficulties of collaborating on interdisciplinary digital projects and their need for credit and attribution when publishing data or research. More advanced forms of digital history also require access to technical expertise. While historians should know the basics of computational thinking, most may not be able to train as a programmer and as a historian – how much should we expect people to know about making software?

I've hinted at the impact of convenience in accessing digitised historical materials, and in those various stages of 'discovering', 'gathering', 'creating' and 'sharing'… We must also consider how experiences of digital technologies have influenced our understanding of what is possible in historical research, and the factors that limit the impact of digital technologies. The ease with which historians transform data from text notes to spreadsheets to maps to publications and presentations is almost taken for granted, but it shows the impact of digitality on enhancing everyday research practices.

So digital history has potential, is being demonstrated, but there's more to do…

From piles of material to patchwork: How do we embed the production of usable collections data into library work?

How do we embed the production of usable collections data into library work?These notes were prepared for a panel discussion at the 'Always Already Computational: Collections as Data' (#AACdata) workshop, held in Santa Barbara in March 2017. While my latest thinking on the gap between the scale of collections and the quality of data about them is informed by my role in the Digital Scholarship team at the British Library, I've also drawn on work with catalogues and open cultural data at Melbourne Museum, the Museum of London, the Science Museum and various fellowships. My thanks to the organisers and the Institute of Museum and Library Services for the opportunity to attend. My position paper was called 'From libraries as patchwork to datasets as assemblages?' but in hindsight, piles and patchwork of material seemed a better analogy.

The invitation to this panel asked us to share our experience and perspective on various themes. I'm focusing on the challenges in making collections available as data, based on years of working towards open cultural data from within various museums and libraries. I've condensed my thoughts about the challenges down into the question on the slide: How do we embed the production of usable collections data into library work?

It has to be usable, because if it's not then why are we doing it? It has to be embedded because data in one-off projects gets isolated and stale. 'Production' is there because infrastructure and workflow is unsexy but necessary for access to the material that makes digital scholarship possible.

One of the biggest issues the British Library (BL) faces is scale. The BL's collections are vast – maybe 200 million items – and extremely varied. My experience shows that publishing datasets (or sharing them with aggregators) exposes the shortcomings of past cataloguing practices, making the size of the backlog all too apparent.

Good collections data (or metadata, depending on how you look at it) is necessary to avoid the overwhelmed, jumble sale feeling of using a huge aggregator like Europeana, Trove, or the DPLA, where you feel there's treasure within reach, if only you could find it. Publishing collections online often increases the number of enquiries about them – how can institution deal with enquiries at scale when they already have a cataloguing backlog? Computational methods like entity identification and extraction could complement the 'gold standard' cataloguing already in progress. If they're made widely available, these other methods might help bridge the resourcing gaps that mean it's easier to find items from richer institutions and countries than from poorer ones.

Photo of piles of materialYou probably already all know this, but it's worth remembering: our collections aren't even (yet) a patchwork of materials. The collections we hold, and the subset we can digitise and make available for re-use are only a tiny proportion of what once existed. Each piece was once part of something bigger, and what we have now has been shaped by cumulative practical and intellectual decisions made over decades or centuries. Digitisation projects range from tiny specialist databases to huge commercial genealogy deals, while some areas of the collections don't yet have digital catalogue records. Some items can't be digitised because they're too big, small or fragile for scanning or photography; others can't be shared because of copyright, data protection or cultural sensitivities. We need to be careful in how we label datasets so that the absences are evident.

(Here, 'data' may include various types of metadata, automatically generated OCR or handwritten text recognition transcripts, digital images, audio or video files, crowdsourced enhancements or any combination or these and more)

Image credit: https://www.flickr.com/photos/teen_s/6251107713/

In addition to the incompleteness or fuzziness of catalogue data, when collections appear as data, it's often as great big lumps of things. It's hard for normal scholars to process (or just unzip) 4gb of data.

Currently, datasets are often created outside normal processes, and over time they become 'stale' as they're not updated when source collections records change. And when they manage to unzip them, the records rely on internal references – name authorities for people, places, etc – that can only be seen as strings rather than things until extra work is undertaken.

The BL's metadata team have experimented with 'researcher format' CSV exports around specific themes (eg an exhibition), and CSV is undoubtedly the most accessible format – but what we really need is the ability for people to create their own queries across catalogues, and create their own datasets from the results. (And by queries I don't mean SPARQL but rather faceted browsing or structured search forms).

Image credit: screenshot from http://data.bl.uk/

Collections are huge (and resources relatively small) so we need to supplement manual cataloguing with other methods. Sometimes the work of crafting links from catalogues to external authorities and identifiers will be a machine job, with pieces sewn together at industrial speed via entity recognition tools that can pull categories out or text and images. Sometimes it's operated by a technologist who runs records through OpenRefine to find links to name authorities or Wikidata records. Sometimes it's a labour of scholarly love, with links painstakingly researched, hand-tacked together to make sure they fit before they're finally recorded in a bespoke database.

This linking work often happens outside the institution, so how can we ingest and re-use it appropriately? And if we're to take advantage of computational methods and external enhancements, then we need ways to signal which categories were applied by catalogues, which by software, by external groups, etc.

The workflow and interface adjustments required would be significant, but even more challenging would be the internal conversations and changes required before a consensus on the best way to combine the work of cataloguers and computers could emerge.

The trick is to move from a collection of pieces to pieces of a collection. Every collection item was created in and about places, and produced by and about people. They have creative, cultural, scientific and intellectual properties. There's a web of connections from each item that should be represented when they appear in datasets. These connections help make datasets more usable, turning strings of text into references to things and concepts to aid discoverability and the application of computational methods by scholars. This enables structured search across datasets – potentially linking an oral history interview with a scientist in the BL sound archive, their scientific publications in journals, annotated transcriptions of their field notebooks from a crowdsourcing project, and published biography in the legal deposit library.

A lot of this work has been done as authority files like AAT, ULAN etc are applied in cataloguing, so our attention should turn to turning local references into URIs and making the most of that investment.

Applying identifiers is hard – it takes expert care to disambiguate personal names, places, concepts, even with all the hinting that context-aware systems might be able to provide as machine learning etc techniques get better. Catalogues can't easily record possible attributions, and there's understandable reluctance to publish an imperfect record, so progress on the backlog is slow. If we're not to be held back by the need for records to be perfectly complete before they're published, then we need to design systems capable of capturing the ambiguity, fuzziness and inherent messiness of historical collections and allowing qualified descriptors for possible links to people, places etc. Then we need to explain the difference to users, so that they don't overly rely on our descriptions, making assumptions about the presence or absence of information when it's not appropriate.

Image credit: http://europeana.eu/portal/record/2021648/0180_N_31601.html

Photo of pipes over a buildingA lot of what we need relies on more responsive infrastructure for workflows and cataloguing systems. For example, the BL's systems are designed around the 'deliverable unit' – the printed or bound volume, the archive box – because for centuries the reading room was where you accessed items. We now need infrastructure that makes items addressable at the manuscript, page and image level in order to make the most of the annotations and links created to shared identifiers.

(I'd love to see absorbent workflows, soaking up any related data or digital surrogates that pass through an organisation, no matter which system they reside in or originate from. We aren't yet making the most of OCRd text, let alone enhanced data from other processes, to aid discoverability or produce datasets from collections.)

Image credit: https://www.flickr.com/photos/snorski/34543357
My final thought – we can start small and iterate, which is just as well, because we need to work on understanding what users of collections data need and how they want to use them. We're making a start and there's a lot of thoughtful work behind the scenes, but maybe a bit more investment is needed from research libraries to become as comfortable with data users as they are with the readers who pass through their physical doors.

The rise of interpolated content?

One thing that might stand out when we look back at 2014 is the rise of interpolated content. We've become used to translating around auto-correct errors in texts and emails but we seem to be at a tipping point where software is going ahead and rewriting content rather than prompting you to notice and edit things yourself.

iOS doesn't just highlight or fix typos, it changes the words you've typed. To take one example, iOS users might use 'ill' more than they use 'ilk', but if I typed 'ilk' I'm not happy when it's replaced by an algorithmically-determined 'ill'. As a side note, understanding the effect of auto-correct on written messages will be a challenge for future historians (much as it is for us sometimes now).

And it's not only text. In 2014, Adobe previewed GapStop, 'a new video technology that eases transitions and removes pauses from video automatically'. It's not just editing out pauses, it's creating filler images from existing images to bridge the gaps so the image doesn't jump between cuts. It makes it a lot harder to tell when someone's words have been edited to say something different to what they actually said – again, editing audio and video isn't new, but making it so easy to remove the artefacts that previously provided clues to the edits is.

Photoshop has long let you edit the contrast and tone in images, but now their Content-Aware Move, Fill and Patch tools can seamlessly add, move or remove content from images, making it easy to create 'new' historical moments. The images on extrapolated-art.com, which uses '[n]ew techniques in machine learning and image processing […] to extrapolate the scene of a painting to see what the full scenery might have looked like' show the same techniques applied to classic paintings.

But photos have been manipulated since they were first used, so what's new? As one Google user reported in It’s Official: AIs are now re-writing history, 'Google’s algorithms took the two similar photos and created a moment in history that never existed, one where my wife and I smiled our best (or what the algorithm determined was our best) at the exact same microsecond, in a restaurant in Normandy.' The important difference here is that he did not create this new image himself: Google's scripts did, without asking or specifically notifying him. In twenty years time, this fake image may become part of his 'memory' of the day. Automatically generated content like this also takes the question of intent entirely out of the process of determining 'real' from interpolated content. And if software starts retrospectively 'correcting' images, what does that mean for our personal digital archives, for collecting institutions and for future historians?

Interventions between the act of taking a photo and posting it on social media might be one of the trends of 2015. Facebook are about to start 'auto-enhancing' your photos, and apparently, Facebook Wants To Stop You From Uploading Drunk Pictures Of Yourself. Apparently this is to save your mum and boss seeing them; the alternative path of building a social network that don't show everything you do to your mum and boss was lost long ago. Would the world be a better place if Facebook or Twitter had a 'this looks like an ill-formed rant, are you sure you want to post it?' function?

So 2014 seems to have brought the removal of human agency from the process of enhancing, and even creating, text and images. Algorithms writing history? Where do we go from here? How will we deal with the increase of interpolated content when looking back at this time? I'd love to hear your thoughts.

Looking for (crowdsourcing) love in all the right places

One of the most important exercises in the crowdsourcing workshops I run is the 'speed dating' session. The idea is to spend some time looking at a bunch of crowdsourcing projects until you find a project you love. Finding a project you enjoy gives you a deeper insight into why other people participate in crowdsourcing, and will see you through the work required to get a crowdsourcing project going. I think making a personal connection like this helps reduce some of the cynicism I occasionally encounter about why people would volunteer their time to help cultural heritage collections. Trying lots of projects also gives you a much better sense of the types of barriers projects can accidentally put in the way of participation. It's also a good reminder that everyone is a nerd about something, and that there's a community of passion for every topic you can think of.

If you want to learn more about designing history or cultural heritage crowdsourcing projects, trying out lots of project is a great place to start. The more time you can spend on this the better – an hour is ideal – but trying just one or two projects is better than nothing. In a workshop I get people to note how a project made them feel – what they liked most and least about a project, and who they'd recommend it to. You can also note the input and output types to help build your mental database of relevant crowdsourcing projects.

The list of projects I suggest varies according to the background of workshop participants, and I'll often throw in suggestions tailored to specific interests, but here's a generic list to get you started.

10 Most Wanted http://10most.org.uk/ Research object histories
Ancient Lives http://ancientlives.org/ Humanities, language, text transcription
British Library Georeferencer http://www.bl.uk/maps/ Locating and georeferencing maps (warning: if it's running, only hard maps may be left!)
Children of the Lodz Ghetto http://online.ushmm.org/lodzchildren/ Citizen history, research
Describe Me http://describeme.museumvictoria.com.au/ Describe objects
DIY History http://diyhistory.lib.uiowa.edu/ Transcribe historical letters, recipes, diaries
Family History Transcription Project http://www.flickr.com/photos/statelibrarync/collections/ Document transcription (Flickr/Yahoo login required to comment)
Herbaria@home http://herbariaunited.org/atHome/ (for bonus points, compare it with Notes from Nature https://www.zooniverse.org/project/notes_from_nature) Transcribing specimen sheets (or biographical research)
HistoryPin Year of the Bay 'Mysteries' https://www.historypin.org/attach/project/22-yearofthebay/mysteries/index/ Help find dates, locations, titles for historic photographs; overlay images on StreetView
iSpot http://www.ispotnature.org/ Help 'identify wildlife and share nature'
Letters of 1916 http://dh.tcd.ie/letters1916/ Transcribe letters and/or contribute letters
London Street Views 1840 http://crowd.museumoflondon.org.uk/lsv1840/ Help transcribe London business directories
Micropasts http://crowdsourced.micropasts.org/app/photomasking/newtask Photo-masking to help produce 3D objects; also structured transcription
Museum Metadata Games: Dora http://museumgam.es/dora/ Tagging game with cultural heritage objects (my prototype from 2010)
NYPL Building Inspector http://buildinginspector.nypl.org/ A range of tasks, including checking building footprints, entering addresses
Operation War Diary http://operationwardiary.org/ Structured transcription of WWI unit diaries
Papers of the War Department http://wardepartmentpapers.org/ Document transcription
Planet Hunters http://planethunters.org/ Citizen science; review visualised data
Powerhouse Museum Collection Search http://www.powerhousemuseum.com/collection/database/menu.php Tagging objects
Reading Experience Database http://www.open.ac.uk/Arts/RED/ Text selection, transcription, description.
Smithsonian Digital Volunteers: Transcription Center https://transcription.si.edu/ Text transcription
Tiltfactor Metadata Games http://www.metadatagames.org/ Games with cultural heritage images
Transcribe Bentham http://www.transcribe-bentham.da.ulcc.ac.uk/ History; text transcription
Trove http://trove.nla.gov.au/newspaper?q= Correct OCR errors, transcribe text, tag or describe documents
US National Archives http://www.amara.org/en/teams/national-archives/ Transcribing videos
What's the Score at the Bodleian http://www.whats-the-score.org/ Music and text transcription, description
What's on the menu http://menus.nypl.org/ Structured transcription of restaurant menus
What's on the menu? Geotagger http://menusgeo.herokuapp.com/ Geolocating historic restaurant menus
Wikisource – random item link http://en.wikisource.org/wiki/Special:Random/Index Transcribing texts
Worm Watch http://www.wormwatchlab.org Citizen science; video
Your Paintings Tagger http://tagger.thepcf.org.uk/ Paintings; free-text or structured tagging

NB: crowdsourcing is a dynamic field, some sites may be temporarily out of content or have otherwise settled in transit. Some sites require registration, so you may need to find another site to explore while you're waiting for your registration email.

It's here! Crowdsourcing our Cultural Heritage is now available

My edited volume, Crowdsourcing our Cultural Heritage, is now available! My introduction (Crowdsourcing our cultural heritage: Introduction), which provides an overview of the field and outlines the contribution of the 12 chapters, is online at Ashgate's site, along with the table of contents and index. There's a 10% discount if you order online.

If you're in London on the evening of Thursday 20th November, we're celebrating with a book launch party at the UCL Centre for Digital Humanities. Register at http://crowdsourcingculturalheritage.eventbrite.co.uk.

Here's the back page blurb: "Crowdsourcing, or asking the general public to help contribute to shared goals, is increasingly popular in memory institutions as a tool for digitising or computing vast amounts of data. This book brings together for the first time the collected wisdom of international leaders in the theory and practice of crowdsourcing in cultural heritage. It features eight accessible case studies of groundbreaking projects from leading cultural heritage and academic institutions, and four thought-provoking essays that reflect on the wider implications of this engagement for participants and on the institutions themselves.

Crowdsourcing in cultural heritage is more than a framework for creating content: as a form of mutually beneficial engagement with the collections and research of museums, libraries, archives and academia, it benefits both audiences and institutions. However, successful crowdsourcing projects reflect a commitment to developing effective interface and technical designs. This book will help practitioners who wish to create their own crowdsourcing projects understand how other institutions devised the right combination of source material and the tasks for their ‘crowd’. The authors provide theoretically informed, actionable insights on crowdsourcing in cultural heritage, outlining the context in which their projects were created, the challenges and opportunities that informed decisions during implementation, and reflecting on the results.

This book will be essential reading for information and cultural management professionals, students and researchers in universities, corporate, public or academic libraries, museums and archives."

Massive thanks to the following authors of chapters for their intellectual generosity and their patience with up to five rounds of edits, plus proofing, indexing and more…

  1. Crowdsourcing in Brooklyn, Shelley Bernstein;
  2. Old Weather: approaching collections from a different angle, Lucinda Blaser;
  3. ‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections, Tim Causer and Melissa Terras;
  4. Build, analyse and generalise: community transcription of the Papers of the War Department and the development of Scripto, Sharon M. Leon;
  5. What's on the menu?: crowdsourcing at the New York Public Library, Michael Lascarides and Ben Vershbow;
  6. What’s Welsh for ‘crowdsourcing’? Citizen science and community engagement at the National Library of Wales, Lyn Lewis Dafis, Lorna M. Hughes and Rhian James;
  7. Waisda?: making videos findable through crowdsourced annotations, Johan Oomen, Riste Gligorov and Michiel Hildebrand;
  8. Your Paintings Tagger: crowdsourcing descriptive metadata for a national virtual collection, Kathryn Eccles and Andrew Greg.
  9. Crowdsourcing: Crowding out the archivist? Locating crowdsourcing within the broader landscape of participatory archives, Alexandra Eveleigh;
  10.  How the crowd can surprise us: humanities crowdsourcing and the creation of knowledge, Stuart Dunn and Mark Hedges;
  11. The role of open authority in a collaborative web, Lori Byrd Phillips;
  12. Making crowdsourcing compatible with the missions and values of cultural heritage organisations, Trevor Owens.

Who loves your stuff? How to collect links to your site

If you've ever wondered who's using content from your site or what people find interesting, here are some ways to find out, using the Design Museum's URL as an example.

'Links to your site' via Google Webmaster Tools https://support.google.com/webmasters/answer/55281

Reddit – plug your URL in after /domain/
http://www.reddit.com/domain/designmuseum.org

Wikipedia – plug your URL in after target=
http://en.wikipedia.org/w/index.php?title=Special%3ALinkSearch&target=*.designmuseum.org
Depending on your topic coverage you may want to look at other language Wikipedias.

Pinterest – plug your URL in after /source/
http://www.pinterest.com/source/designmuseum.org/

Twitter – search for the URL with quotes around it e.g. "designmuseum.org"

If you can see one particular page shooting up in your web stats, you could try a reverse image search on TinEye to see where it's being referenced.

What am I missing? I'd love to hear about similar links and methods for other sites – tell me in the comments or on twitter @mia_out.

Update: in a similar vein, Tim Sherratt @wragge launched a new experiment called Trove Traces the same day, to 'explore how Trove newspapers are used' by listing pages that link to articles: http://trovespace.webfactional.com/traces/

Update 2: Desi Gonzalez @desigonz tried out some of these techniques and put together a great post on 'Thoughts on what museums can learn from Reddit, Yelp, and what @briandroitcour calls vernacular criticism'
You might also be interested in: Can you capture visitors with a steampunk arm?

The sounds of silence

I've been reading World War One diaries and letters (getting distracted by sources is an occupational hazard in my research) as I look for sample primary sources for teaching crowdsourcing at the HILT summer school in Maryland next week and for my CENDARI fellowship later this year.

I noticed one line in the Diary of William Henry Winter WWI 1915 that manages to convey a lot without directly giving any information about his opinions or relationship with this person:

'Major Saunders is supposed to be on his way back here as well but I don't know as he is coming back to our Coy, I hope not any way. We have got a good man now.'

There's nothing in the rest of the entries online that provides any further background. It may be that sections of this correspondence either didn't survive, weren't held by the same person, or perhaps were edited before deposit with the library or during transcription (it's particularly hard to judge as the site doesn't have images of the original document), so this particular silence may not have been intentional.

Whatever the case, it's a good reminder that there are silences behind every piece of content. While it's an amazing time to research the lives of those caught up in WWI as more and more private and public material is digitised and shared, silences can be created in many ways – official archives privilege some voices over others, personal collections can be censored or remain tucked away in a shoebox, and large parts of people's experiences simply went unrecorded. Content hidden behind paywalls or inaccessible to search engines (whether inadvertently hidden behind a search box or through lack of text transcription or description) is effectively hushed, if not exactly silenced. Sources and information about WWI collected via community groups on Facebook may be lost the next time they change their terms and conditions, or only partially shared. Our challenge is to make the gaps and questions about what was collected visible (audible?) while also being careful not to render the undigitised or unsearchable invisible in our rush to privilege the easily-accessible.

[Update: I've just realised that Winter might not have needed to provider further context as it seems many men in his unit were from the same region as him, and therefore his relationship with the Major may have pre-dated the war. Tacit knowledge is of course another example of the unrecorded, and one perhaps more familiar to us now than the unsayable.]

Piloting a Participatory History Commons

I've been awarded a CENDARI Visiting Research Fellowship at Trinity College Dublin for a project called 'Bridging collections with a participatory Commons: a pilot with World War One archives'. I've posted my proposal at the link above, and when I start in September I'll post about my progress here. CENDARI have now published the list of all 2014 Fellows and a neat summary of the programme: 'The CENDARI Visiting Research Fellowships are intended to support and stimulate historical research in the two pilot areas of medieval European culture and the First World War, by facilitating access to key archives, specialist knowledge and collections in CENDARI host institutions'.

As I said in my post, 'it's an ambitious project which requires tackling community building, user experience design, historical materials and programming, and I'll be drawing on the expertise of many people'. I'll post as I go – but first, I'd best get back to finishing up my PhD thesis!

In the meantime, here's a small collection of things I've written as I think through what a participatory commons is and how it might work: my poster and talk notes for Herrenhausen conference and my keynote for Sharing is Caring, 'Enriching cultural heritage collections through a Participatory Commons platform: a provocation about collaborating with users'.

How can we connect museum technologists with their history?

A quick post triggered by an article on the role of domain knowledge (knowledge of a field) in critical thinking, Deep in thought:

Domain knowledge is so important because of the way our memories work. When we think, we use both working memory and long-term memory. Working memory is the space where we take in new information from our environment; everything we are consciously thinking about is held there. Long-term memory is the store of knowledge that we can call up into working memory when we need it. Working memory is limited, whereas long-term memory is vast. Sometimes we look as if we are using working memory to reason, when actually we are using long-term memory to recall. Even incredibly complex tasks that seem as if they must involve working memory can depend largely on long-term memory.

When we are using working memory to progress through a new problem, the knowledge stored in long-term memory will make that process far more efficient and successful. … The more parts of the problem that we can automate and store in long-term memory, the more space we will have available in working memory to deal with the new parts of the problem.

A few years ago I defined a 'museum technologist' as 'someone who can appropriately apply a range of digital solutions to help meet the goals of a particular museum project', and deep domain knowledge clearly has a role to play in this (also in the kinds of critical thinking that will save technologists from being unthinking cheerleaders for the newest buzzword or geek toy). 

There's a long history of hard-won wisdom, design patterns and knowledge (whether about ways not to tender for or specify software, reasons why proposed standards may or may not work, translating digital methods and timelines for departments raised on print, etc – I'm sure you all have examples) contained in the individual and collective memory of individual technologists and teams. Some of it is represented in museum technology mailing lists, blogs or conference proceedings, but the lessons learnt in the past aren't always easily discoverable by people encountering digital heritage issues for the first time. And then there's the issue of working out which knowledge relates to specific, outdated technologies and which still holds while not quashing the enthusiasm of new people with a curt 'we tried that before'…

Something in the juxtaposition of the 20th anniversary of BritPop and the annual wave of enthusiasm and discovery from the international Museums and the Web (#MW2014) conference prompted me to look at what the Museums Computer Group (MCG) and Museum Computer Network (MCN) lists were talking about in April five and ten years ago (i.e. in easily-accessible archives):

Five years ago in #musetech – open web, content distribution, virtualisation, wifi https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind0904&L=mcg&X=498A43516F310B2193 http://mcn.edu/pipermail/mcn-l/2009-April/date.html

Ten years ago in #musetech people were talking about knowledge organisation and video links with schools https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind04&L=mcg&F=&S=&X=498A43516F310B2193

Some of the conversations from that random sample are still highly relevant today, and more focused dives into various archives would probably find approaches and information that'd help people tackling current issues.

So how can we help people new to the sector find those previous conversations and get some of this long-term memory into their own working memory? Pointing people to search forms for the MCG and MCN lists is easy, some of the conference proceedings are a bit trickier (e.g. search within the museumsandtheweb.com) and there's no central list of museum technology blogs that I know of. Maybe people could nominate blog posts they think stand the test of time, mindful of the risk of it turning into a popularity/recency thing?

If you're new(ish) to digital heritage, how did you find your feet? Which sites or communities helped you, and how did you find them? Or if you have a new team member, how do you help them get up to speed with museum technology? Or looking further afield, which resources would you send to someone from academia or related heritage fields who wanted to learn about building heritage resources for or with specialists and the public?

'Go digital' at Museums Association 2012 Conference

Some people who couldn't make the Museums Association conference (or #museums2012) asked for more information on the session on digital strategies, so here are my introductory remarks and some scribbled highlights of the speakers' papers and discussion with the audience.

Update: a year later, I've thought of a 'too long, didn't read' version: digital strategies are like puberty. Everyone has to go through it, but life's better on the other side when you've figured things out. Digital should be incorporated into engagement, collections, venue etc strategies – it's not a thing on its own.

The speakers were Carolyn Royston (@caro_ft), Head of New Media at Imperial War Museum; Hugh Wallace (@tumshie), Head of Digital Media at National Museums Scotland; Michael Woodward (@michael1665), Commercial Director at York Museums Trust, and I chaired the session in my role as Chair of the Museums Computer Group. From the conference programme: 'This session explores the importance of developing a digital strategy. It will provide insight into how organisations can incorporate digital into a holistic approach that meets wider organisational and public engagement objectives and look at how to use digital engagement as a catalyst to drive organisational change.'

After various conversations about digital and museums with people who were interested in the session, I updated my introduction so that overall the challenge of embracing the impact of digital technologies, platforms and audiences on museums was put in a positive light.  The edited title that appeared in the programme had a different emphasis ('Go digital' rather than the 'Getting strategic about digital' we submitted) so I wanted it to be clear that we weren't pushing a digital agenda for the sake of technology itself. Or as I apparently said at the time, "it's not about making everything digital, it's about dealing with the fact that digital is everywhere".

I started by asking people to raise their hands if their museum had a digital strategy, and I'd say well over half the room responded, which surprised me. Perhaps a third were in the process of planning for a digital strategy and just a few were yet to start at all.

My notes were something like this: "we probably all know by now that digital technologies bring wonderful opportunities for museums and their audiences, but you might also be worried about the impact of technology on audiences and your museum. ‘Digital’ varies in organisations – it might encompass social media, collections, mobile, marketing, in-gallery interactives, broadcast and content production. It touches every public-facing output of the museum as well as back-office functions and infrastructure.

You can’t avoid the impact of digital on your organisation, so it’s about how you deal with it, how you integrate it into the fabric of your museum. As you’ll hear in the case studies, implementing digital strategy itself changes the organisation, so from the moment you start talking to people about devising a digital strategy, you'll be making progress. For some of our presenters, their digital strategy ultimately took the form of a digital vision document – the strategy itself is embedded in the process and in the resulting framework for working across the organisation. A digital strategy framework allows you to explore options in conversation with the whole organisation, it’s not about making everything digital.

Our case studies come from three very different organisations working with different collections in different contexts. Mike, Commercial Director at York Museums Trust will talk about planning the journey, moving from ad hoc work to making digital integral to how the organisation works; Hugh, Head of Digital Media at National Museums Scotland will discuss the process they went through to develop digital strategy, what’s worked and what hasn’t’; Carolyn Royston, Head of Digital Media at Imperial War Museums, who comes from a learning background, will talk from IWM’s digital adventure, from where they started to where they are now. They’re each at different stages of the process of implementing and living with a digital strategy.

Based on our discussions as we planned this session, the life cycle of a digital strategy in a museum seems to be: aspiration, design, education and internal outreach, integration with other strategies (particularly public engagement) and sign off… then take a deep breath, look at what the ripple effect has been and start updating your strategies as everything will have changed since you started. And with that, over to Mike…"

Mike talked about working out when digital delivery really makes sense, whether for inaccessible objects (like a rock on Mars) or a delicate book; the major role that outreach and communication play in the process of creating a digital strategy; appointing the staff that would deliver it based on eagerness, enthusiasm and teamwork rather than pure tech skills; where digital teams should sit in the organisation; and about the possibility of using digital volunteers (or 'armchair experts') to get content online.

Hugh went for 'frameworks, not fireworks', pointing out that what happens after the strategy is written is important so you need to create a flexible framework to manage the inevitable change.  He discussed the importance of asking the right-sized question (as in one case, where 'we didn't know at the start that an app would be the answer') and working on getting digital into 'business as usual' rather than an add-on team with specialist skills.  Or as one tweeter summarised, 'work across depts, don't get hung up on the latest tech, define users realistically and keep it simple'.

Carolyn covered the different forms of digital engagement and social media the IWM have been trying and the role of creating their digital vision in helping overcome their fears; the benefits of partnerships with other organisations for piggybacking on their technology, networks and audiences, and the fact that their collections sales have gone up as a result of opening up their collections.  In the questions, someone described intellectual property restrictions to try to monetise collections as 'fool's gold' – great term!  I think we should have a whole conference session on this sometime soon.

When reviewing our discussions beforehand I'd found a note from a planning call which summed up how much the process should change the organisation: 'if you're not embarrassed by your digital strategy six months after sign-off you probably haven't done it right', and on the day the speakers reinforced my impression that ultimately, devising and implementing a digital strategy is (probably) a necessary process to go through but it's not a goal in its own right.  The IWM and NMS examples show that the internal education and conversations can both create a bigger appetite for digital engagement and change organisational expectations around digital to the point where it has to be more widely integrated.  The best place for a digital strategy is within a public engagement strategy that integrates the use of digital platforms and working methods into the overall public-facing work of the museum.

Listening to the speakers, a new metaphor occurred to me: is implementing a digital strategy like gardening? It needs constant care and feeding after the big job of sowing seeds is over. And much like gardening for pleasure (in the UK, anyway), the process may have more impact than the product.

And something I didn't articulate at the time – if the whole museum is going to be doing some digital work, we technologists are going to have to be patient and generous in sharing our knowledge and helping everyone learn how to make sensible decisions about digital content and experiences.  If we don't, we risk being a bottleneck or forcing people to proceed based on guesswork and neither are good for museums or their audiences.

So much awesomeness! #GODIGITAL #Museums2012 twitter.com/dannybirchall/…
— Danny Birchall (@dannybirchall) November 9, 2012

Huge thanks for Carolyn, Hugh and Michael for making the whole thing such a pleasure and to the Museum Association conference organisers for the opportunity to share our thoughts and experiences.

And finally, if you're interested in digital strategies in heritage organisations, the Museums Computer Groups annual Museums on the Web conference is all about being 'strategically digital' (which as you might have guessed from the above, sometimes might mean not using technology at all) but UKMW12 tickets are selling out fast, so don't delay.