2014 – Open Objects

The rise of interpolated content?

One thing that might stand out when we look back at 2014 is the rise of interpolated content. We've become used to translating around auto-correct errors in texts and emails but we seem to be at a tipping point where software is going ahead and rewriting content rather than prompting you to notice and edit things yourself.

iOS doesn't just highlight or fix typos, it changes the words you've typed. To take one example, iOS users might use 'ill' more than they use 'ilk', but if I typed 'ilk' I'm not happy when it's replaced by an algorithmically-determined 'ill'. As a side note, understanding the effect of auto-correct on written messages will be a challenge for future historians (much as it is for us sometimes now).

And it's not only text. In 2014, Adobe previewed GapStop, 'a new video technology that eases transitions and removes pauses from video automatically'. It's not just editing out pauses, it's creating filler images from existing images to bridge the gaps so the image doesn't jump between cuts. It makes it a lot harder to tell when someone's words have been edited to say something different to what they actually said – again, editing audio and video isn't new, but making it so easy to remove the artefacts that previously provided clues to the edits is.

Photoshop has long let you edit the contrast and tone in images, but now their Content-Aware Move, Fill and Patch tools can seamlessly add, move or remove content from images, making it easy to create 'new' historical moments. The images on extrapolated-art.com, which uses '[n]ew techniques in machine learning and image processing […] to extrapolate the scene of a painting to see what the full scenery might have looked like' show the same techniques applied to classic paintings.

But photos have been manipulated since they were first used, so what's new? As one Google user reported in It’s Official: AIs are now re-writing history, 'Google’s algorithms took the two similar photos and created a moment in history that never existed, one where my wife and I smiled our best (or what the algorithm determined was our best) at the exact same microsecond, in a restaurant in Normandy.' The important difference here is that he did not create this new image himself: Google's scripts did, without asking or specifically notifying him. In twenty years time, this fake image may become part of his 'memory' of the day. Automatically generated content like this also takes the question of intent entirely out of the process of determining 'real' from interpolated content. And if software starts retrospectively 'correcting' images, what does that mean for our personal digital archives, for collecting institutions and for future historians?

Interventions between the act of taking a photo and posting it on social media might be one of the trends of 2015. Facebook are about to start 'auto-enhancing' your photos, and apparently, Facebook Wants To Stop You From Uploading Drunk Pictures Of Yourself. Apparently this is to save your mum and boss seeing them; the alternative path of building a social network that don't show everything you do to your mum and boss was lost long ago. Would the world be a better place if Facebook or Twitter had a 'this looks like an ill-formed rant, are you sure you want to post it?' function?

So 2014 seems to have brought the removal of human agency from the process of enhancing, and even creating, text and images. Algorithms writing history? Where do we go from here? How will we deal with the increase of interpolated content when looking back at this time? I'd love to hear your thoughts.

Three ways you can help with 'In their own words: collecting experiences of the First World War' (and a CENDARI project update)

Somehow it's a month since I posted about my CENDARI research project (in Moving forward: modelling and indexing WWI battalions) on this site. That probably reflects the rhythm of the project – less trying to work out what I want to do and more getting on with doing it. A draft post I started last month simply said, 'A lot of battalions were involved in World War One'. I'll do a retrospective post soon, and here's a quick summary of on-going work.

First, a quick recap. My project has two goals – one, to collect a personal narrative for each battalion in the Allied armies of the First World War; two, to create a service that would allow someone to ask 'where was a specific battalion at a specific time?'. Together, they help address a common situation for people new to WWI history who might ask something like 'I know my great-uncle was in the 27th Australian battalion in March 1916, where would he have been and what would he have experienced?'.

I've been working on streamlining and simplifying the public-facing task of collecting a personal narrative for each battalion, and have written a blog post, Help collect soldiers’ experiences of WWI in their own words, that reduces it to three steps:

Take one of the diaries, letters and memoirs listed on the Collaborative Collections wiki, and
Match its author with a specific regiment or battalion.
Send in the results via this form.

If you know of a local history society, family historian or anyone else who might be interested in helping, please send them along to this post: Help collect soldiers’ experiences of WWI in their own words.

Work on specifying the relevant data structures to support a look-up service to answer questions about a specific units location and activities at a specific time largely moved to the wiki:

Talk:British battalions and regiments in World War I
Talk:British Army Hierarchies
Template talk:Battalion – what information should be recorded on every battalion/unit page?
Template talk:Infobox command structure – what structured data should be recorded about military hierarchies?
Template talk:Infobox theatre of war/doc – what structured data should be recorded about a unit's activities and engagements in the war?
Template talk:Infobox military unit – what structured data should be recorded about a battalion/unit?

You can see the infobox structures in progress by flipping from the talk to the Template tabs. You'll need to request an account to join in but more views, sample data and edge cases would be really welcome.

Populating the list of battalions and other units has been a huge task in itself, partly because very few cultural institutions have definitive lists of units they can (or want to) share, but it's necessary to support both core goals. I've been fortunate to have help (see 'Thanks and recent contributions' on 'How you can help') but the task is on-going so get in touch if you can help!

So there are three different ways you can help with 'In their own words: collecting experiences of the First World War':

collect diaries linked to specific battalions;
help check or complete the lists of Australian battalions, British battalions and regiments, Canadian battalions and regiments, Indian battalions, Italian battalions and New Zealand battalions in World War;
review and contribute to the data structures needed to record information about military units in the Talk and Template pages above

Finally, last week I was in New Zealand to give a keynote on this work at the National Digital Forum. The video for 'Collaborative collections through a participatory commons' is online, so you can catch up on the background for my project if you've got 40 minutes or so to spare. Should you be in Dublin, I'm giving a talk on 'A pilot with public participation in historical research: linking lived experiences of the First World War' at the Trinity Long Room Hub today (thus the poster).

And if you've made it this far, perhaps you'd like to apply for a CENDARI Visiting Research Fellowships 2015 yourself?

All the things I didn't say in my welcome to UKMW14 'Museums beyond the web'…

Here are all the things I (probably) didn't say in my Chair's welcome for the Museums Computer Group annual conference… Other notes, images and tweets from the day are linked from 'UKMW14 round-up: posts, tweets, slides and images'.

Welcome to MCG's UKMW14: Museums beyond the web! We've got great speakers lined up, and we've built in lots of time to catch up and get to know your peers, so we hope you'll enjoy the day.

It's ten years since the MCG's Museums on the Web became an annual event, and it's 13 years since it was first run in 2001. It feels like a lot has changed since then, but, while the future is very definitely here, it's also definitely not evenly distributed across the museum sector. It's also an interesting moment for the conference, as 'the web' has broadened to include 'digital', which in turn spans giant distribution networks and tiny wearable devices. 'The web' has become a slightly out-dated shorthand term for 'audience-facing technologies'.

When looking back over the last ten years of programmes, I found myself thinking about planetary orbits. Small planets closest to the sun whizz around quickly, while the big gas giants move incredibly slowly. If technology start-ups are like Mercury, completing a year in just 88 Earth days, and our audiences are firmly on Earth time, museum time might be a bit closer to Mars, taking two Earth years for each Mars year, or sometimes even Jupiter, completing a circuit once every twelve years or so.

But museums aren't planets, so I can only push that metaphor so far. Different sections of a museum move at different speeds. While heroic front of house staff can observe changes in audience behaviours on a daily basis and social media platforms can be adopted overnight, websites might be redesigned every few years, but galleries are only updated every few decades (if you're lucky). For a long time it felt like museums were using digital platforms to broadcast at audiences without really addressing the challenges of dialogue or collaborating with external experts.

But at this point, it seems that, finally, working on digital platforms like the web has pushed museums to change how they work. On a personal level, the need for specific technical skills hasn't changed, but more content, education and design jobs work across platforms, are consciously 'multi-channel' and audience rather than platform-centred in their focus. Web teams seem to be settling into public engagement, education, marketing etc departments as the idea of a 'digital' department slowly becomes an oxymoron. Frameworks from software development are slowly permeating organisations that use to think in terms of print runs and physical gallery construction. Short rounds of agile development are replacing the 'build and abandon after launch' model, voices from a range of departments are replacing the disembodied expert voice, and catalogues are becoming publications that change over time.

While many of us here are comfortable with these webby methods, how will we manage the need to act as translators between digital and museums while understanding the impact of new technologies? And how can we help those who are struggling to keep up, particularly with the impact of the cuts?

Today is a chance to think about the technologies that will shape the museums of the future. What will audiences want from us? Where will they go looking for information and expertise, and how much of that information and expertise should be provided by museums? How can museums best provide access to their collections and knowledge over the next five, ten years?

We're grateful to our sponsors, particularly as their support helps keep ticket prices affordable. Firstly I'd like to thank our venue sponsors, the Natural History Museum. Secondly, I'd like to thank Faversham & Moss for their sponsorship of this conference. Go chat to them and find out more about their work!

Moving forward: modelling and indexing WWI battalions

A super-quick update from my CENDARI Fellowship this week. I set up the wiki for In their own words: linking lived experiences of the First World War a week ago but only got stuck into populating it with lists of various national battalions this week. My current task list, copied from the front page is to:

Populate list of military units: Australian battalions in World War I, British battalions and regiments in World War I), Canadian battalions in World War I, Indian battalions in World War I, Italian battalions in World War I, New Zealand battalions in World War I. A list of battalions is needed to form the basis for the collecting process. (I'm starting with a list of divisions because I can get it from Wikipedia, but I know this is problematic)
Collate lists of personal diaries, letters, memoirs that can be linked to units through their authors
Collate lists of official unit diaries and histories
Collate resources on researching World War One records to help researchers know where to start
Create a sample battalion page as a demonstrator to show how personal accounts can be linked
Collate information about private letters, diaries and memoirs

If you can help with any of that, let me know! Or just get stuck in and edit the site.

I've started another Google Doc with very sketchy Notes towards modelling information about World War One Battalions. I need to test it with more battalion histories and update it iteratively. At this stage my thinking is to turn it into an InfoBox format to create structured data via the wiki. It's all very lo-fi and much less designed than my usual projects, but I'm hoping people will be able to help regardless.

So, in this phase of the project, the aim is find a personal narrative – a diary, letters, memoirs or images – for each military unit in the British Army. Can you help?

Looking for (crowdsourcing) love in all the right places

One of the most important exercises in the crowdsourcing workshops I run is the 'speed dating' session. The idea is to spend some time looking at a bunch of crowdsourcing projects until you find a project you love. Finding a project you enjoy gives you a deeper insight into why other people participate in crowdsourcing, and will see you through the work required to get a crowdsourcing project going. I think making a personal connection like this helps reduce some of the cynicism I occasionally encounter about why people would volunteer their time to help cultural heritage collections. Trying lots of projects also gives you a much better sense of the types of barriers projects can accidentally put in the way of participation. It's also a good reminder that everyone is a nerd about something, and that there's a community of passion for every topic you can think of.

If you want to learn more about designing history or cultural heritage crowdsourcing projects, trying out lots of project is a great place to start. The more time you can spend on this the better – an hour is ideal – but trying just one or two projects is better than nothing. In a workshop I get people to note how a project made them feel – what they liked most and least about a project, and who they'd recommend it to. You can also note the input and output types to help build your mental database of relevant crowdsourcing projects.

The list of projects I suggest varies according to the background of workshop participants, and I'll often throw in suggestions tailored to specific interests, but here's a generic list to get you started.

10 Most Wanted http://10most.org.uk/	Research object histories
Ancient Lives http://ancientlives.org/	Humanities, language, text transcription
British Library Georeferencer http://www.bl.uk/maps/	Locating and georeferencing maps (warning: if it's running, only hard maps may be left!)
Children of the Lodz Ghetto http://online.ushmm.org/lodzchildren/	Citizen history, research
Describe Me http://describeme.museumvictoria.com.au/	Describe objects
DIY History http://diyhistory.lib.uiowa.edu/	Transcribe historical letters, recipes, diaries
Family History Transcription Project http://www.flickr.com/photos/statelibrarync/collections/	Document transcription (Flickr/Yahoo login required to comment)
Herbaria@home http://herbariaunited.org/atHome/ (for bonus points, compare it with Notes from Nature https://www.zooniverse.org/project/notes_from_nature)	Transcribing specimen sheets (or biographical research)
HistoryPin Year of the Bay 'Mysteries' https://www.historypin.org/attach/project/22-yearofthebay/mysteries/index/	Help find dates, locations, titles for historic photographs; overlay images on StreetView
iSpot http://www.ispotnature.org/	Help 'identify wildlife and share nature'
Letters of 1916 http://dh.tcd.ie/letters1916/	Transcribe letters and/or contribute letters
London Street Views 1840 http://crowd.museumoflondon.org.uk/lsv1840/	Help transcribe London business directories
Micropasts http://crowdsourced.micropasts.org/app/photomasking/newtask	Photo-masking to help produce 3D objects; also structured transcription
Museum Metadata Games: Dora http://museumgam.es/dora/	Tagging game with cultural heritage objects (my prototype from 2010)
NYPL Building Inspector http://buildinginspector.nypl.org/	A range of tasks, including checking building footprints, entering addresses
Operation War Diary http://operationwardiary.org/	Structured transcription of WWI unit diaries
Papers of the War Department http://wardepartmentpapers.org/	Document transcription
Planet Hunters http://planethunters.org/	Citizen science; review visualised data
Powerhouse Museum Collection Search http://www.powerhousemuseum.com/collection/database/menu.php	Tagging objects
Reading Experience Database http://www.open.ac.uk/Arts/RED/	Text selection, transcription, description.
Smithsonian Digital Volunteers: Transcription Center https://transcription.si.edu/	Text transcription
Tiltfactor Metadata Games http://www.metadatagames.org/	Games with cultural heritage images
Transcribe Bentham http://www.transcribe-bentham.da.ulcc.ac.uk/	History; text transcription
Trove http://trove.nla.gov.au/newspaper?q=	Correct OCR errors, transcribe text, tag or describe documents
US National Archives http://www.amara.org/en/teams/national-archives/	Transcribing videos
What's the Score at the Bodleian http://www.whats-the-score.org/	Music and text transcription, description
What's on the menu http://menus.nypl.org/	Structured transcription of restaurant menus
What's on the menu? Geotagger http://menusgeo.herokuapp.com/	Geolocating historic restaurant menus
Wikisource – random item link http://en.wikisource.org/wiki/Special:Random/Index	Transcribing texts
Worm Watch http://www.wormwatchlab.org	Citizen science; video
Your Paintings Tagger http://tagger.thepcf.org.uk/	Paintings; free-text or structured tagging

NB: crowdsourcing is a dynamic field, some sites may be temporarily out of content or have otherwise settled in transit. Some sites require registration, so you may need to find another site to explore while you're waiting for your registration email.

In which I am awed by the generosity of others, and have some worthy goals

A quick update from my CENDARI fellowship working on a project that's becoming 'In their own words: linking lived experiences of the First World War'. I've spent the week reading (again a mixture of original diaries and letters, technical stuff like ontology documentation and also WWI history forums and 'amateur' sites) and writing. I put together a document outlining a rang of possible goals and some very sketchy tech specs, and opened it up for feedback. The goals I set out are copied below for those who don't want to delve into detail. The commentable document, 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions goes into more detail.

However, the main point of this post is to publicly thank those who've helped by commenting and sharing on the doc, on twitter or via email. Hopefully I'm not forgetting anyone, as I've been blown away by and am incredibly grateful for the generosity of those who've taken the time to at least skim 1600 words (!). It's all helped me clarify my ideas and find solutions I'm able to start implementing next week. In no order at all – at CENDARI, Jennifer Edmond, Alex O'Connor, David Stuart, Benjamin Štular, Francesca Morselli, Deirdre Byrne; online Andrew Gray @generalising; Alex Stinson @ DHKState; jason webber @jasonmarkwebber; Alastair Dunning @alastairdunning; Ben Brumfield @benwbrum; Christine Pittsley; Owen Stephens @ostephens; David Haskiya @DavidHaskiya; Jeremy Ottevanger @jottevanger; Monika Lechner @lemondesign; Gavin Robinson ‏@merozcursed; Tom Pert @trompet2 – thank you all!

Worthy goals (i.e. things I'm hoping to accomplish, with the help of historians and the public; only some of which I'll manage in the time)

At the end of this project, someone who wants to research a soldier in WWI but doesn't know a thing about how armies were structured should be able to find a personal narrative from a soldier in the same bit of the army, to help them understand experiences of the Great War.

Hopefully these personal accounts will provide some context, in their own words, for the lived experiences of WWI. Some goals listed are behind-the-scenes stuff that should just invisibly make personal diaries, letters and memoirs more easily discoverable. It needs datasets that provide structures that support relationships between people and documents; participatory interfaces for creating or enhancing information about contemporary materials (which feed into those supporting structures), and interfaces that use the data created.

More specifically, my goals include:

A personal account by someone in each unit linked to that unit's record, so that anyone researching a WWI name would have at least one account to read. To populate this dataset, personal accounts (diaries, letters, etc) would need to be linked to specific soldiers, who can then be linked to specific units. Linking published accounts such as official unit histories would be a bonus. [Semantic MediaWiki]
Researched links between individual men and the units they served in, to allow their personal accounts to be linked to the relevant military unit. I'm hoping I can find historians willing to help with the process of finding and confirming the military unit the writer was in. [Semantic MediaWiki]
A platform for crowdsourcing the transcription and annotation of digitised documents. The catch is that the documents for transcription would be held remotely on a range of large and small sites, from Europeana's collection to library sites that contain just one or two digitised diaries. Documents could be tagged/annotated with the names of people, places, events, or concepts represented in them. [Semantic MediaWiki??]
A structured dataset populated with the military hierarchy (probably based on The British order of battle of 1914-1918) that records the start and end dates of each parent-child relationship (an example of how much units moved within the hierarchy)
A published webpage for each unit, to hold those links to official and personal documents about that unit in WWI. In future this page could include maps, timelines and other visualisations tailored to the attributes of a unit, possibly including theatres of war, events, campaigns, battles, number of privates and officers, etc. (Possibly related to CENDARI Work Package 9?) [Semantic MediaWiki]
A better understanding of what people want to know at different stages of researching WWI histories. This might include formal data gathering, possibly a combination of interviews, forum discussions or survey

Goals that are more likely to drop off, or become quick experiments to see how far you can get with accessible tools:

Trained 'named entity recognition' and 'natural language processing' tools that could be run over transcribed text to suggest possible people, places, events, concepts, etc [this might drop off the list as the CENDARI project is working on a tool called Pineapple (PDF poster). That said, I'll probably still experiment with the Stanford NER tool to see what the results are like]
A way of presenting possible matches from the text tools above for verification or correction by researchers. Ideally, this would be tied in with the ability to annotate documents
The ability to search across different repositories for a particular soldier, to help with the above.

Commentable version: 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions.

Linking lived experiences of WWI through battalions?

Another update from my CENDARI Fellowship at Trinity College Dublin, looking at 'In their own words: linking lived experiences of the First World War', which is a small-scale, short-term pilot based on WWI collections. My first post is Defining the scope: week one as a CENDARI Fellow. Over the past two weeks I've done a lot of reading – more WWI diaries and letters; WWI histories and historiography; specialist information like military structures (orders of battle, etc). I've also sketched out lots of snippets of possible functions, data, relationships and other outcomes.

I've narrowed the key goal (or minimum viable product, if you prefer) of my project to linking personal accounts of the war – letters, diaries, memoirs, photographs, etc – to battalions, by creating links from the individual who wrote them to their military unit. Once these personal accounts are linked to particular military units, they can be linked to higher units – from the battalion, ship or regiment to brigade, corps, etc – and to particular places, activities, events and campaigns. The idea behind this is to provide context for an individual's experience of WWI by linking to narratives written by people in the same situation. I'm still working out how to organise the research process of matching the right soldier to the right battalion/regiment/ship so that relevant personal stories are discoverable. I'm also still working out which attributes of a battalion are relevant, how granular the data will be, and how to design for the inevitable variation in data quality (for example, the availability of records for different armies varies hugely). Finally, I’m still working out which bits need computer science tools and which need the help of other historians.

Given the number of centenary projects, I was hoping to find more structured data about WWI entities. Trenches to Triples would be useful source of permanent URLs, and terms to train named entity recognition, but am I missing other sources?

There's a lot of content, and so much activity around WWI records, but it's spread out across the internet. Individual people and small organisations are digitising and transcribing diaries and letters. Big collecting projects like Europeana have lots of personal accounts, but they're often not transcribed and they don't seem to be linked to structured data about the item itself. Some people have painstakingly transcribed unit diaries, but they're not linked from the official site, so others wouldn't know there's a more easily read version of the diary available. I've been wondering if you could crowdsource the process of transcribing records held elsewhere, and offer the transcripts back to sites. Using dedicated transcription software would let others suggest corrections, and might also make it possible to link sections of the text to external 'entities' like names, places, events and concepts.

Albert Henry Bailey. Image:
Sir George Grey Special Collections,
Auckland Libraries, AWNS-19150909-39-5

To help figure out the issues researchers face and the variations in available resources, I'm researching randomly selected soldiers from different Allied forces. I've posted my notes on Private Albert Henry Bailey, service number 13/970a. You'll see that they're in prose form, and don't contain any structured data. Most of my research used digitised-but-not-transcribed images of documents, with some transcribed accounts. It would definitely benefit from deeper knowledge of military history – for a start, which battalions were in the same place as his unit at the same time?

This account of the arrival and first weeks of the Auckland Mount Rifles at Gallipoli from the official unit history gives a sense of the density and specificity of local place names, as does the official unit diary, and I assume many personal accounts. I'm not sure how named entity recognition tools will cope, and ideally I'd like to find lists of places to 'train' the tools (including possibly some from the 'Trenches to Triples' project).

If there aren't already any structured data sources for military hierarchies in WWI, do I have to make one? And if so, how? The idea would be to turn prose descriptions like this Australian War Memorial history of the 27th AIF Battalion, this order of battle of the 2nd Australian Division and any other suitable sources into structured data. I can see some ways it might be possible to crowdsource the task, but it's a big task. But it's worth it – providing a service that lets people look up which higher military units, places. activities and campaigns a particular battalion/regiment/ship was linked to at a given time would be a good legacy for my research.

I'm sure I'm forgetting lots of things, and my list of questions is longer than my list of answers, but I should end here. To close, I want to share a quote from the official history of the Auckland Mounted Rifles. The author said he 'would like to speak of the splendid men of the rank and file who died during this three months' struggle. Many names rush to the memory, but it is not possible to mention some without doing an injustice to the memory of others'. I guess my project is driven by a vision of doing justice to the memory of every soldier, particularly those ordinary men who aren't as easily found in the records. I'm hoping that drawing on the work of other historians and re-linking disparate sources will help provide as much context as possible for their experiences of the First World War.

—
Update, 15 October 2014: if you've made it this far, you might also be interested in chipping in at 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions.

It's here! Crowdsourcing our Cultural Heritage is now available

My edited volume, Crowdsourcing our Cultural Heritage, is now available! My introduction (Crowdsourcing our cultural heritage: Introduction), which provides an overview of the field and outlines the contribution of the 12 chapters, is online at Ashgate's site, along with the table of contents and index. There's a 10% discount if you order online.

If you're in London on the evening of Thursday 20th November, we're celebrating with a book launch party at the UCL Centre for Digital Humanities. Register at http://crowdsourcingculturalheritage.eventbrite.co.uk.

Here's the back page blurb: "Crowdsourcing, or asking the general public to help contribute to shared goals, is increasingly popular in memory institutions as a tool for digitising or computing vast amounts of data. This book brings together for the first time the collected wisdom of international leaders in the theory and practice of crowdsourcing in cultural heritage. It features eight accessible case studies of groundbreaking projects from leading cultural heritage and academic institutions, and four thought-provoking essays that reflect on the wider implications of this engagement for participants and on the institutions themselves.

Crowdsourcing in cultural heritage is more than a framework for creating content: as a form of mutually beneficial engagement with the collections and research of museums, libraries, archives and academia, it benefits both audiences and institutions. However, successful crowdsourcing projects reflect a commitment to developing effective interface and technical designs. This book will help practitioners who wish to create their own crowdsourcing projects understand how other institutions devised the right combination of source material and the tasks for their ‘crowd’. The authors provide theoretically informed, actionable insights on crowdsourcing in cultural heritage, outlining the context in which their projects were created, the challenges and opportunities that informed decisions during implementation, and reflecting on the results.

This book will be essential reading for information and cultural management professionals, students and researchers in universities, corporate, public or academic libraries, museums and archives."

Massive thanks to the following authors of chapters for their intellectual generosity and their patience with up to five rounds of edits, plus proofing, indexing and more…

Crowdsourcing in Brooklyn, Shelley Bernstein;
Old Weather: approaching collections from a different angle, Lucinda Blaser;
‘Many hands make light work. Many hands together make merry work’: Transcribe Bentham and crowdsourcing manuscript collections, Tim Causer and Melissa Terras;
Build, analyse and generalise: community transcription of the Papers of the War Department and the development of Scripto, Sharon M. Leon;
What's on the menu?: crowdsourcing at the New York Public Library, Michael Lascarides and Ben Vershbow;
What’s Welsh for ‘crowdsourcing’? Citizen science and community engagement at the National Library of Wales, Lyn Lewis Dafis, Lorna M. Hughes and Rhian James;
Waisda?: making videos findable through crowdsourced annotations, Johan Oomen, Riste Gligorov and Michiel Hildebrand;
Your Paintings Tagger: crowdsourcing descriptive metadata for a national virtual collection, Kathryn Eccles and Andrew Greg.
Crowdsourcing: Crowding out the archivist? Locating crowdsourcing within the broader landscape of participatory archives, Alexandra Eveleigh;
How the crowd can surprise us: humanities crowdsourcing and the creation of knowledge, Stuart Dunn and Mark Hedges;
The role of open authority in a collaborative web, Lori Byrd Phillips;
Making crowdsourcing compatible with the missions and values of cultural heritage organisations, Trevor Owens.

Defining the scope: week one as a CENDARI Fellow

I'm coming to the end of my first week as a Transnational Access Fellow with the CENDARI project at the Trinity College Dublin Long Room Hub. CENDARI 'aims to leverage innovative technologies to provide historians with the tools by which to contextualise, customise and share their research', which dovetails with my PhD research incredibly well. This Fellowship gives me an opportunity to extend my ideas about 'Enriching cultural heritage collections through a Participatory Commons' without trying to squish them into a history thesis, and is probably perfectly timed in giving me a break from writing up.

View over Trinity College Dublin

There are two parts to my CENDARI project 'Bridging collections with a participatory Commons: a pilot with World War One archives'. The first involves working on the technical, data and cultural context/requirements for the 'participatory history commons' as an infrastructure; the second is a demonstrator based on that infrastructure. I'll be working out how official records and 'shoebox archives' can be mined and indexed to help provide what I'm calling 'computationally-generated context' for people researching lives touched by World War One.

This week I've read metadata schema (MODS extended with TEI and a local schema, if you're interested) and ontology guidelines, attended some lively seminars on Irish history, gotten my head around CENDARI's work packages and the structure of the British army during WWI. I've started a list of nearby local history societies with active research projects to see if I can find some working on WWI history – I'd love to work with people who have sources they want to digitise and generally do more with, and people who are actively doing research on First World War lives. I've started to read sample primary materials and collect machine-readable sources so I can test out approaches by manually marking-up and linking different repositories of records. I'm going to spend the rest of the day tidying up my list of outcomes and deliverables and sketching out how all the different aspects of my project fit together. And tonight I'm going to check out some of the events at Discover Research Dublin. Nerd joy!

'The cooperative archive'?

Finally, I've dealt with something I'd put off for ages. 'Commons' is one of those tricky words that's less resonant than it could be, so I looked for a better name than the 'participatory history commons'. because 'commons' is one of those tricky words that's less resonant than it could be. I doodled around words like collation, congeries, cluster, demos, assemblage, sources, commons, active, engaged, participatory, opus, archive, digital, posse, mob, cahoots and phrases like collaborative collections, collaborative history, history cooperative, but eventually settled on 'cooperative archive'. This appeals because 'cooperative' encompasses attitudes or values around working together for a common purpose, and it includes those who share records and those who actively work to enhance and contextualise them. 'Archive' suggests primary sources, and can be applied to informal collections of 'shoebox archives' and the official holdings of museums, libraries and archives.

What do you think – does 'cooperative archive' work for you? Does your first reaction to the name evoke anything like my thoughts above?

Update, October 11: following some market testing on Facebook, it seems 'collaborative collections' best describes my vision.

These are a few of my favourite (audience research) things

On Friday I popped into London to give a talk at the Art of Digital meetup at the Photographer's Gallery. It's a great series of events organised by Caroline Heron and Jo Healy, so go along sometime if you can. I talked about different ways of doing audience research. (And when I wrote the line 'getting to know you' it gave me an earworm and a 'lessons from musicals' theme). It was a talk of two halves. In the first, I outlined different ways of thinking about audience research, then went into a little more detail about a few of my favourite (audience research) things.

There are lots of different ways to understand the contexts and needs different audiences bring to your offerings. You probably also want to test to see if what you're making works for them and to get a sense of what they're currently doing with your websites, apps or venues. It can help to think of research methods along scales of time, distance, numbers, 'density' and intimacy. (Or you could think of it as a journey from 'somewhere out there' to 'dancing cheek to cheek'…)

'Time' refers to both how much time a method asks from the audience and how much time it takes to analyse the results. There's no getting around the fact that nearly all methods require time to plan, prepare and pilot, sorry! You can run 5 second tests that ask remote visitors a single question, or spend months embedded in a workplace shadowing people (and more time afterwards analysing the results). On the distance scale, you can work with remote testers located anywhere across the world, ask people visiting your museum to look at a few prototype screens, or physically locate yourself in someone's office for an interview or observation.

Numbers and 'density' (or the richness of communication and the resulting data) tend to be inversely linked. Analytics or log files let you gather data from millions of website or app users, one-question surveys can garner thousands of responses, you can interview dozens of people or test prototypes with 5-8 users each time. However, the conversations you'll have in a semi-structured interview are much richer than the responses you'll get to a multiple-choice questionnaire. This is partly because it's a two-way dialogue, and partly because in-person interviews convey more information, including tone of voice, physical gestures, impressions of a location and possibly even physical artefacts or demonstrations. Generally, methods that can reach millions of remote people produce lots of point data, while more intimate methods that involve spending lots of time with just a few people produce small datasets of really rich data.

So here are few of my favourite things: analytics, one-question surveys, 5 second tests, lightweight usability tests, semi-structured interviews, and on-site observations. Ultimately, the methods you use are a balance of time and distance, the richness of the data required, and whether you want to understand the requirements for, or measure the performance of a site or tool.

Analytics are great for understanding how people found you, what they're doing on your site, and how this changes over time. Analytics can help you work out which bits of a website need tweaking, and for measuring to see the impact of changes. But that only gets you so far – how do you know which trends are meaningful and which are just noise? To understand why people are doing what they do, you need other forms of research to flesh them out.

One question surveys are a great way of finding out why people are on your site, and whether they've succeeded in achieving their goals for being there. We linked survey answers to analytics for the last Let's Get Real project so we could see how people who were there for different reasons behaved on the site, but you don't need to go that far – any information about why people are on your site is better than none!

5 second tests and lightweight usability tests are both ways to find out how well a design works for its intended audiences. 5 second tests show people an interface for 5 seconds, then ask them what they remember about it, or where they'd click to do a particular task. They're a good way to make sure your text and design are clear. Usability tests take from a few minutes to an hour, and are usually done in person. One of my favourite lightweight tests involves grabbing a sketch, an iPad or laptop and asking people in a café or other space if they'd help by testing a site for a few minutes. You can gather lots of feedback really quickly, and report back with a prioritised list of fixes by the end of the day.

Semi-structured interviews use the same set of questions each time to ensure some consistency between interviews, but they're flexible enough to let you delve into detail and follow any interesting diversions that arise during the conversation. Interviews and observations can be even more informative if they're done in the space where the activities you're interested in take place. 'Contextual inquiry' goes a step further by including observations of the tasks you're interested in being performed. If you can 'apprentice' yourself to someone, it's a great way to have them explain to you why things are done the way they are. However, it's obviously a lot more difficult to find someone willing and able to let you observe them in this way, it's not appropriate for every task or research question, and the data that results can be so rich and dense with information that it takes a long time to review and analyse.

And one final titbit of wisdom from a musical – always look on the bright side of life! Any knowledge is better than none, so if you manage to get any audience research or usability testing done then you're already better off than you were before.

[Update: a comment on twitter reminded me of another favourite research thing: if you don't yet have a site/app/campaign/whatever, test a competitor's!]