history – Open Objects

Three ways you can help with 'In their own words: collecting experiences of the First World War' (and a CENDARI project update)

Somehow it's a month since I posted about my CENDARI research project (in Moving forward: modelling and indexing WWI battalions) on this site. That probably reflects the rhythm of the project – less trying to work out what I want to do and more getting on with doing it. A draft post I started last month simply said, 'A lot of battalions were involved in World War One'. I'll do a retrospective post soon, and here's a quick summary of on-going work.

First, a quick recap. My project has two goals – one, to collect a personal narrative for each battalion in the Allied armies of the First World War; two, to create a service that would allow someone to ask 'where was a specific battalion at a specific time?'. Together, they help address a common situation for people new to WWI history who might ask something like 'I know my great-uncle was in the 27th Australian battalion in March 1916, where would he have been and what would he have experienced?'.

I've been working on streamlining and simplifying the public-facing task of collecting a personal narrative for each battalion, and have written a blog post, Help collect soldiers’ experiences of WWI in their own words, that reduces it to three steps:

Take one of the diaries, letters and memoirs listed on the Collaborative Collections wiki, and
Match its author with a specific regiment or battalion.
Send in the results via this form.

If you know of a local history society, family historian or anyone else who might be interested in helping, please send them along to this post: Help collect soldiers’ experiences of WWI in their own words.

Work on specifying the relevant data structures to support a look-up service to answer questions about a specific units location and activities at a specific time largely moved to the wiki:

Talk:British battalions and regiments in World War I
Talk:British Army Hierarchies
Template talk:Battalion – what information should be recorded on every battalion/unit page?
Template talk:Infobox command structure – what structured data should be recorded about military hierarchies?
Template talk:Infobox theatre of war/doc – what structured data should be recorded about a unit's activities and engagements in the war?
Template talk:Infobox military unit – what structured data should be recorded about a battalion/unit?

You can see the infobox structures in progress by flipping from the talk to the Template tabs. You'll need to request an account to join in but more views, sample data and edge cases would be really welcome.

Populating the list of battalions and other units has been a huge task in itself, partly because very few cultural institutions have definitive lists of units they can (or want to) share, but it's necessary to support both core goals. I've been fortunate to have help (see 'Thanks and recent contributions' on 'How you can help') but the task is on-going so get in touch if you can help!

So there are three different ways you can help with 'In their own words: collecting experiences of the First World War':

collect diaries linked to specific battalions;
help check or complete the lists of Australian battalions, British battalions and regiments, Canadian battalions and regiments, Indian battalions, Italian battalions and New Zealand battalions in World War;
review and contribute to the data structures needed to record information about military units in the Talk and Template pages above

Finally, last week I was in New Zealand to give a keynote on this work at the National Digital Forum. The video for 'Collaborative collections through a participatory commons' is online, so you can catch up on the background for my project if you've got 40 minutes or so to spare. Should you be in Dublin, I'm giving a talk on 'A pilot with public participation in historical research: linking lived experiences of the First World War' at the Trinity Long Room Hub today (thus the poster).

And if you've made it this far, perhaps you'd like to apply for a CENDARI Visiting Research Fellowships 2015 yourself?

Moving forward: modelling and indexing WWI battalions

A super-quick update from my CENDARI Fellowship this week. I set up the wiki for In their own words: linking lived experiences of the First World War a week ago but only got stuck into populating it with lists of various national battalions this week. My current task list, copied from the front page is to:

Populate list of military units: Australian battalions in World War I, British battalions and regiments in World War I), Canadian battalions in World War I, Indian battalions in World War I, Italian battalions in World War I, New Zealand battalions in World War I. A list of battalions is needed to form the basis for the collecting process. (I'm starting with a list of divisions because I can get it from Wikipedia, but I know this is problematic)
Collate lists of personal diaries, letters, memoirs that can be linked to units through their authors
Collate lists of official unit diaries and histories
Collate resources on researching World War One records to help researchers know where to start
Create a sample battalion page as a demonstrator to show how personal accounts can be linked
Collate information about private letters, diaries and memoirs

If you can help with any of that, let me know! Or just get stuck in and edit the site.

I've started another Google Doc with very sketchy Notes towards modelling information about World War One Battalions. I need to test it with more battalion histories and update it iteratively. At this stage my thinking is to turn it into an InfoBox format to create structured data via the wiki. It's all very lo-fi and much less designed than my usual projects, but I'm hoping people will be able to help regardless.

So, in this phase of the project, the aim is find a personal narrative – a diary, letters, memoirs or images – for each military unit in the British Army. Can you help?

In which I am awed by the generosity of others, and have some worthy goals

A quick update from my CENDARI fellowship working on a project that's becoming 'In their own words: linking lived experiences of the First World War'. I've spent the week reading (again a mixture of original diaries and letters, technical stuff like ontology documentation and also WWI history forums and 'amateur' sites) and writing. I put together a document outlining a rang of possible goals and some very sketchy tech specs, and opened it up for feedback. The goals I set out are copied below for those who don't want to delve into detail. The commentable document, 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions goes into more detail.

However, the main point of this post is to publicly thank those who've helped by commenting and sharing on the doc, on twitter or via email. Hopefully I'm not forgetting anyone, as I've been blown away by and am incredibly grateful for the generosity of those who've taken the time to at least skim 1600 words (!). It's all helped me clarify my ideas and find solutions I'm able to start implementing next week. In no order at all – at CENDARI, Jennifer Edmond, Alex O'Connor, David Stuart, Benjamin Štular, Francesca Morselli, Deirdre Byrne; online Andrew Gray @generalising; Alex Stinson @ DHKState; jason webber @jasonmarkwebber; Alastair Dunning @alastairdunning; Ben Brumfield @benwbrum; Christine Pittsley; Owen Stephens @ostephens; David Haskiya @DavidHaskiya; Jeremy Ottevanger @jottevanger; Monika Lechner @lemondesign; Gavin Robinson ‏@merozcursed; Tom Pert @trompet2 – thank you all!

Worthy goals (i.e. things I'm hoping to accomplish, with the help of historians and the public; only some of which I'll manage in the time)

At the end of this project, someone who wants to research a soldier in WWI but doesn't know a thing about how armies were structured should be able to find a personal narrative from a soldier in the same bit of the army, to help them understand experiences of the Great War.

Hopefully these personal accounts will provide some context, in their own words, for the lived experiences of WWI. Some goals listed are behind-the-scenes stuff that should just invisibly make personal diaries, letters and memoirs more easily discoverable. It needs datasets that provide structures that support relationships between people and documents; participatory interfaces for creating or enhancing information about contemporary materials (which feed into those supporting structures), and interfaces that use the data created.

More specifically, my goals include:

A personal account by someone in each unit linked to that unit's record, so that anyone researching a WWI name would have at least one account to read. To populate this dataset, personal accounts (diaries, letters, etc) would need to be linked to specific soldiers, who can then be linked to specific units. Linking published accounts such as official unit histories would be a bonus. [Semantic MediaWiki]
Researched links between individual men and the units they served in, to allow their personal accounts to be linked to the relevant military unit. I'm hoping I can find historians willing to help with the process of finding and confirming the military unit the writer was in. [Semantic MediaWiki]
A platform for crowdsourcing the transcription and annotation of digitised documents. The catch is that the documents for transcription would be held remotely on a range of large and small sites, from Europeana's collection to library sites that contain just one or two digitised diaries. Documents could be tagged/annotated with the names of people, places, events, or concepts represented in them. [Semantic MediaWiki??]
A structured dataset populated with the military hierarchy (probably based on The British order of battle of 1914-1918) that records the start and end dates of each parent-child relationship (an example of how much units moved within the hierarchy)
A published webpage for each unit, to hold those links to official and personal documents about that unit in WWI. In future this page could include maps, timelines and other visualisations tailored to the attributes of a unit, possibly including theatres of war, events, campaigns, battles, number of privates and officers, etc. (Possibly related to CENDARI Work Package 9?) [Semantic MediaWiki]
A better understanding of what people want to know at different stages of researching WWI histories. This might include formal data gathering, possibly a combination of interviews, forum discussions or survey

Goals that are more likely to drop off, or become quick experiments to see how far you can get with accessible tools:

Trained 'named entity recognition' and 'natural language processing' tools that could be run over transcribed text to suggest possible people, places, events, concepts, etc [this might drop off the list as the CENDARI project is working on a tool called Pineapple (PDF poster). That said, I'll probably still experiment with the Stanford NER tool to see what the results are like]
A way of presenting possible matches from the text tools above for verification or correction by researchers. Ideally, this would be tied in with the ability to annotate documents
The ability to search across different repositories for a particular soldier, to help with the above.

Commentable version: 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions.

Linking lived experiences of WWI through battalions?

Another update from my CENDARI Fellowship at Trinity College Dublin, looking at 'In their own words: linking lived experiences of the First World War', which is a small-scale, short-term pilot based on WWI collections. My first post is Defining the scope: week one as a CENDARI Fellow. Over the past two weeks I've done a lot of reading – more WWI diaries and letters; WWI histories and historiography; specialist information like military structures (orders of battle, etc). I've also sketched out lots of snippets of possible functions, data, relationships and other outcomes.

I've narrowed the key goal (or minimum viable product, if you prefer) of my project to linking personal accounts of the war – letters, diaries, memoirs, photographs, etc – to battalions, by creating links from the individual who wrote them to their military unit. Once these personal accounts are linked to particular military units, they can be linked to higher units – from the battalion, ship or regiment to brigade, corps, etc – and to particular places, activities, events and campaigns. The idea behind this is to provide context for an individual's experience of WWI by linking to narratives written by people in the same situation. I'm still working out how to organise the research process of matching the right soldier to the right battalion/regiment/ship so that relevant personal stories are discoverable. I'm also still working out which attributes of a battalion are relevant, how granular the data will be, and how to design for the inevitable variation in data quality (for example, the availability of records for different armies varies hugely). Finally, I’m still working out which bits need computer science tools and which need the help of other historians.

Given the number of centenary projects, I was hoping to find more structured data about WWI entities. Trenches to Triples would be useful source of permanent URLs, and terms to train named entity recognition, but am I missing other sources?

There's a lot of content, and so much activity around WWI records, but it's spread out across the internet. Individual people and small organisations are digitising and transcribing diaries and letters. Big collecting projects like Europeana have lots of personal accounts, but they're often not transcribed and they don't seem to be linked to structured data about the item itself. Some people have painstakingly transcribed unit diaries, but they're not linked from the official site, so others wouldn't know there's a more easily read version of the diary available. I've been wondering if you could crowdsource the process of transcribing records held elsewhere, and offer the transcripts back to sites. Using dedicated transcription software would let others suggest corrections, and might also make it possible to link sections of the text to external 'entities' like names, places, events and concepts.

Albert Henry Bailey. Image:
Sir George Grey Special Collections,
Auckland Libraries, AWNS-19150909-39-5

To help figure out the issues researchers face and the variations in available resources, I'm researching randomly selected soldiers from different Allied forces. I've posted my notes on Private Albert Henry Bailey, service number 13/970a. You'll see that they're in prose form, and don't contain any structured data. Most of my research used digitised-but-not-transcribed images of documents, with some transcribed accounts. It would definitely benefit from deeper knowledge of military history – for a start, which battalions were in the same place as his unit at the same time?

This account of the arrival and first weeks of the Auckland Mount Rifles at Gallipoli from the official unit history gives a sense of the density and specificity of local place names, as does the official unit diary, and I assume many personal accounts. I'm not sure how named entity recognition tools will cope, and ideally I'd like to find lists of places to 'train' the tools (including possibly some from the 'Trenches to Triples' project).

If there aren't already any structured data sources for military hierarchies in WWI, do I have to make one? And if so, how? The idea would be to turn prose descriptions like this Australian War Memorial history of the 27th AIF Battalion, this order of battle of the 2nd Australian Division and any other suitable sources into structured data. I can see some ways it might be possible to crowdsource the task, but it's a big task. But it's worth it – providing a service that lets people look up which higher military units, places. activities and campaigns a particular battalion/regiment/ship was linked to at a given time would be a good legacy for my research.

I'm sure I'm forgetting lots of things, and my list of questions is longer than my list of answers, but I should end here. To close, I want to share a quote from the official history of the Auckland Mounted Rifles. The author said he 'would like to speak of the splendid men of the rank and file who died during this three months' struggle. Many names rush to the memory, but it is not possible to mention some without doing an injustice to the memory of others'. I guess my project is driven by a vision of doing justice to the memory of every soldier, particularly those ordinary men who aren't as easily found in the records. I'm hoping that drawing on the work of other historians and re-linking disparate sources will help provide as much context as possible for their experiences of the First World War.

—
Update, 15 October 2014: if you've made it this far, you might also be interested in chipping in at 'Linking lived experiences of the First World War': possible goals and a bunch of technical questions.

Why we need to save the material experience of software objects

Conversations at last month's Sustainable History: Ensuring today's digital history survives event [my slides] (and at the pub afterwards) touched on saving the data underlying websites as a potential solution for archiving them. This is definitely better than nothing, but as a human-computer interaction researcher and advocate for material culture in historical research, I don't think it's enough.

Just as people rue the loss of the information and experiential data conveyed by the material form of objects when they're converted to digital representations – size, paper and print/production quality, marks from wear through use and manufacture, access to its affordances, to name a few – future researchers will rue the information lost if we don't regard digital interfaces and user experiences as vital information about the material form of digital content and record them alongside the data they present.

Can you accurately describe the difference between using MySpace and Facebook in their various incarnations? There's no perfect way to record the experience of using Facebook in December 2013 so it could be compared with the experience of using MySpace in 2005, but usability techniques like screen-recording software linked to eyetracking or think-aloud tests would help preserve some of the tacit knowledge and context users bring to sites alongside the look-and-feel, algorithms and treatments of data the sites present to us. It's not a perfect solution, but a recording of the interactions and designs from both sites for common tasks like finding and adding a friend would tell future researchers infinitely more about changes to social media sites over eight years than simple screenshots or static webpages. But in this case we're still missing the notifications on other people's screens, the emails and algorithmic categorisations that fan out from simple interactions like these…

Even if you don't care about history, anyone studying software – whether websites, mobile apps, digital archives, instrument panels or procedural instructions embedded in hardware – still needs solid methods for capturing the dynamic and subjective experience of using digital technologies. As Lev Manovich says in The Algorithms of Our Lives, when we use software we're "engaging with the dynamic outputs of computation; studying software culture requires us to "record and analyze interactive experiences, following individual users as they navigate a website or play a video game … to watch visitors of an interactive installation as they explore the possibilities defined by the designer—possibilities that become actual events only when the visitors act on them".

The Internet Archive does a great job, but in researching the last twenty years of internet history I'm constantly hitting the limits of their ability to capture dynamic content, let alone the nuance of interfaces. The paradox is that as more of our experiences are mediated through online spaces and the software contained within small boxy devices, we risk leaving fewer traces of our experiences than past generations.

We're all looking at the stars: citizen science projects at ZooCon13

Last Saturday I escaped my desk to head to the Physics department at the University of Oxford and be awed by what we're learning about space (and more terrestrial subjects) through citizen science projects run by Zooniverse at ZooCon13. All the usual caveats about notes from events apply – in particular, assume any errors are mine and that everyone was much more intelligent and articulate than my notes make them sound. These notes are partly written for people in cultural heritage and the humanities who are interested in the design of crowdsourcing projects, and while I enjoyed the scientific presentations I am not even going to attempt to represent them! Chris Lintott live-blogged some of the talks on the day, so check out 'Live from ZooCon' for more. If you're familiar with citizen science you may well know a lot of these examples already – and if you're not, you can't really go wrong by looking at Zooniverse projects.

Aprajita Verma kicked off with SpaceWarps and 'Crowd-sourcing the Discovery of Gravitational Lenses with Citizen Scientists'. She explained the different ways gravitational lenses show up in astronomical images, and that 'strong gravitational lensing research is traditionally very labour-intensive' – computer algorithms generate lots of false positives, so you need people to help. SpaceWarps includes some simulated lenses (i.e. images of the sky with lenses added), mostly as a teaching tool (to provide more examples and increase familiarity with what lenses can look like) but also to make it more interesting for participants. The SpaceWarps interface lets you know when you've missed a (simulated, presumably) lens as well as noting lenses you've marked. They had 2 million image classifications in the first week, and 8500 citizen scientists have participated so far, 40% of whom have participated in 'Talk', the discussion feature. As discussed in their post 'What happens to your markers? A look inside the Space Warps Analysis Pipeline', they've analysed the results so far on ranges between astute/obtuse and pessimistic/optimistic markers – it turns out most people are astute. Each image is reviewed by ten people, so they've got confidence in the results.

Karen Masters talked about 'Cosmic Evolution in the Galaxy Zoo', taking us back to the first Galaxy Zoo project's hopes to have 30,000 volunteers and contrasting that with subsequent peer-reviewed papers that thanked 85,000, or 160,000 or 200,000 volunteers. The project launched in 2007 (before the Zooniverse itself) to look at spiral vs elliptical galaxies and it's all grown from there. The project has found rare objects, most famously the pea galaxies, and as further proof that the Zooniverse is doing 'real science online', the team have produced 36 peer reviewed paper, some with 100+ citations. At least 50 more papers have been produced by others using their data.

Phil Brohan discussed 'New Users for Old Weather'. The Old Weather project is using data from historic ships logs to help answer the question 'is this climate change or just weather?'. Some data was already known but there's a 'metaphorical fog' from missing observations from the past. Since the BBC won't let him put a satellite in a Tardis, they've been creative about finding other sources to help lift 'the fog of ignorance'. This project has long fascinated me because it started off all about science: in Phil's words, 'when we started all this, I was only thinking about the weather', but ended up being about history as well: 'these documents are intrinsically interesting'– he learnt what else was interesting about the logs from project participants who discovered the stories of people, disasters and strange events that lay within them. The third thing the project has generated (after weather and history) is 'a lot of experts'. One example he gave was evidence of the 1918-19 Spanish flu epidemic on board ship, which was investigated after forum posts about it. There's still a lot to do – more logs, including possibly French and Dutch – to come, and things would ideally speed up 'by a factor of ten'.

In Brooke Simmons' talk on 'Future plans for Galaxy Zoo', she raised the eternal issue of what to call participants in crowdsourcing: 'just call everyone collaborators'. 'Citizen scientists' makes a distinction between paid and unpaid scientists, as does 'volunteers'. She wants to help people do their own science, and they're working on making it easier than downloading and learning how to use more complicated tools. As an example, she talked about people collecting 'galaxies with small bulges' and analysing the differences in bulges (like a souped-up Galaxy Zoo Navigator?). She also talked about Zoo Teach, with resources for learning at all ages.

After the break we learnt about 'The Planet 4 Invasion', the climate and seasons of Mars from Meg Schwamb and about Solar Stormwatch in 'Only you can save planet Earth!' from Chris Davis, who was also presenting research from his student Kim Tucker-Wood (sp?). Who knew that solar winds could take the tail off a comet?!

Next up was Chris Lintott on 'Planet Hunting with and without Kepler'. Science communication advice says 'don't show people graphs', and since Planet Hunters is looking at graphs for fun, he thought no-one would want to do Planet Hunters. However, the response has surprised him. And 'it turns out that stars are actually quite interesting as well'. In another example of participants going above and beyond the original scope of the project, project participants watched a talk streamed online on 'heartbeat binaries', and went and found 30 of them from archives, their own records and posted them on the forum. Now a bunch of Planet Hunters are working with Kepler team to follow them up. (As an aside, he showed a screenshot of a future journal paper – the journal couldn't accept the idea that you could be a Planet Hunter and not be part of an academic team so they're listed as the Department of Astronomy at Yale.)

The final speaker was Rob Simpson on 'The Future of the Zooniverse'. To put things in context, he said the human race spends 16 years cumulatively playing the game Angry Birds every day; people spend 2 months every day on the Zooniverse. In the past year, the human race spent 52 years on the Zooniverse's 15 live projects (they've had 23 projects in total). The Andromeda project went through all their data in 22 days – other projects take longer, but still attract dedicated people. In the Zooniverse's immediate future are 'tools for (citizen) scientists' – adding the ability to do analysis in the browser, 'because people have a habit of finding things, just by being given access to the data'. They're also working on 'Letters' – public versions of what might otherwise be detailed forum posts that can be cited, and as a form of publication, it puts them 'in the domain'. They're helping people communicate with each other and embracing their 'machine overlords', using Galaxy Zoo as a training tool for machine learning. As computers get more powerful, the division of work between machines and people will change, perhaps leaving the beautiful, tricky, or complex bits for humans. [Update, June 29, 2013: Rob's posted about his talk on the Zooniverse blog, '52 Years of Human Effort', and corrected his original figure of 35 years to 52 years of human effort.]

At one point a speaker asked who in the room was a moderator on a Zooniverse project, and nearly everyone put their hand up. I felt a bit like giving them a round of applause because their hard work is behind the success of many projects. They're also a lovely, friendly bunch, as I discovered in the pub afterwards.

Conversations in the pub also reminded me of the flipside of people learning so much through these projects – sometimes people lose interest in the original task as their skills and knowledge grow, and it can be tricky to find time to contribute outside of moderating. After a comment by Chris at another event I've been thinking about how you might match people to crowdsourcing projects or tasks – sometimes it might be about finding something that suits their love of the topic, or that matches the complexity or type of task they've previously enjoyed, or finding another unusual skill to learn, or perhaps building really solid stepping stones from their current tasks to more complex ones. But it's tricky to know what someone likes – I quite like transcribing text on sites like Trove or Notes from Nature, but I didn't like it much on Old Weather. And my own preferences change – I didn't think much of Ancient Lives the first time I saw it, but on another occasion I ended up getting completely absorbed in the task. Helping people find the right task and project is also a design issue for projects that have built an 'ecosystem' of parts that contribute to a larger programme, as discussed in 'Using crowdsourcing to manage crowdsourcing' in Frequently Asked Questions about crowdsourcing in cultural heritage and 'A suite of museum metadata games?' in Playing with Difficult Objects – Game Designs to Improve Museum Collections.

An event like ZooCon showed how much citizen science is leading the way – there are lots of useful lessons for humanities and cultural heritage crowdsourcing. If you've read this thinking 'I'd love to try it for my data, but x is a problem', try talking to someone about it – often there are computational techniques for solving similar problems, and if it's not already solved it might be interesting enough that people want to get involved and work with you on it.

Notes from 'Crowdsourcing in the Arts and Humanities'

Last week I attended a one-day conference, 'Digital Impacts: Crowdsourcing in the Arts and Humanities' (#oxcrowd), convened by Kathryn Eccles of Oxford's Internet Institute, and I'm sharing my (sketchy, as always) notes in the hope that they'll help people who couldn't attend.

Stuart Dunn reported on the Humanities Crowdsourcing scoping report (PDF) he wrote with Mark Hedges and noted that if we want humanities crowdsourcing to take off we should move beyond crowdsourcing as a business model and look to form, nurture and connect with communities. Alice Warley and Andrew Greg presented a useful overview of the design decisions behind the Your Paintings Tagger and sparked some discussion on how many people need to view a painting before it's 'completed', and the differences between structured and unstructured tagging. Interestingly, paintings can be 'retired' from the Tagger once enough data has been gathered – I personally think the inherent engagement in tagging is valuable enough to keep paintings taggable forever, even if they're not prioritised in the tagging interface. Kate Lindsay brought a depth of experience to her presentation on 'The Oxford Community Collection Model' (as seen in Europeana 1914-1918 and RunCoCo's 2011 report on 'How to run a community collection online' (PDF)). Some of the questions brought out the importance of planning for sustainability in technology, licences, etc, and the role of existing networks of volunteers with the expertise to help review objects on the community collection days. The role of the community in ensuring the quality of crowdsourced contributions was also discussed in Kimberly Kowal's presentation on the British Library's Georeferencer project. She also reflected on what she'd learnt after the first phase of the Georeferencer project, including that the inherent reward of participating in the activity was a bigger motivator than competitiveness, and the impact on the British Library itself, which has opened up data for wider digital uses and has more crowdsourcing projects planned. I gave a paper which was based on an earlier version, The gift that gives twice: crowdsourcing as productive engagement with cultural heritage, but pushed my thinking about crowdsourcing as a tool for deep engagement with museums and other memory organisations even further. I also succumbed to the temptation to play with my own definitions of crowdsourcing in cultural heritage: 'a form of engagement that contributes towards a shared, significant goal or research question by asking the public to undertake tasks that cannot be done automatically' or 'productive public engagement with the mission and work of memory institutions'.

Chris Lintott of Galaxy Zoo fame shared his definition of success for a crowdsourcing/citizen science project: it has to produce results of value to the research community in less time than could have been done by other means (i.e. it must have been able to achieve something with crowd that couldn't have without them) and discussed how the Ancient Lives project challenged that at first by turning 'a few thousand papyri they didn't have time to transcribe into several thousand data points they didn't have time to read'. While 'serendipitous discovery is a natural consequence of exposing data to large numbers of users' (in the words of the Citizen Science Alliance), they wanted a more sophisticated method for recording potential discoveries experts made while engaging with the material and built a focused 'talk' tool which can programmatically filter out the most interesting unanswered comments and email them to their 30 or 40 expert users. They also have Letters for more structured, journal-style reporting. (I hope I have that right). He also discussed decisions around full text transcriptions (difficult to automatically reconcile) vs 'rich metadata', or more structured indexes of the content of the page, which contain enough information to help historians decide which pages to transcribe in full for themselves.

Some other thoughts that struck me during the day… humanities crowdsourcing has a lot to learn from the application of maths and logic in citizen science – lots of problems (like validating data) that seem intractable can actually be solved algorithmically, and citizen science hypothesis-based approach to testing task and interface design would help humanities projects. Niche projects help solve the problem of putting the right obscure item in front of the right user (which was an issue I wrestled with during my short residency at the Powerhouse Museum last year – in hindsight, building niche projects could have meant a stronger call-to-action and no worries about getting people to navigate to the right range of objects). The variable role of forums and participants' relationship to the project owners and each other came up at various points – in some projects, interactions with a central authority are more valued, in others, community interactions are really important. I wonder how much it depends on the length and size of the project? The potential and dangers of 'gamification' and 'badgeification' and their potentially negative impact on motivation were raised. I agree with Lintott that games require a level of polish that could mean you'd invest more in making them than you'd get back in value, but as a form of engagement that can create deeper relationships with cultural heritage and/or validate some procrastination over a cup of tea, I think they potentially have a wider value that balances that.

I was also asked to chair the panel discussion, which featured Kimberly Kowal, Andrew Greg, Alice Warley, Laura Carletti, Stuart Dunn and Tim Causer. Questions during the panel discussion included:

'what happens if your super-user dies?' (Super-users or super contributors are the tiny percentage of people who do most of the work, as in this Old Weather post) – discussion included mass media as a numbers game, the idea that someone else will respond to the need/challenge, and asking your community how they'd reach someone like them. (This also helped answer the question 'how do you find your crowd?' that came in from twitter)
'have you ever paid anyone?' Answer: no
'can you recruit participants through specialist societies?' From memory, the answer was 'yes but it does depend'.
something like 'have you met participants in real life?' – answer, yes, and it was an opportunity to learn from them, and to align the community, institution, subject and process.
'badgeification?'. Answer: the quality of the reward matters more than the levels (so badges are probably out).
'what happens if you force students to work on crowdsourcing projects?' – one suggestion was to look for entries on Transcribe Bentham in a US English class blog
'what's happened to tagging in art museums, where's the new steve.museum or Brooklyn Museum?' – is it normalised and not written about as much, or has it declined?
'how can you get funding for crowdsourcing projects?'. One answer – put a good application in to the Heritage Lottery Fund. Or start small, prove the value of the project and get a larger sum. Other advice was to be creative or use existing platforms. Speaking of which, last year the Citizen Science Alliance announced 'the first open call for proposals by researchers who wish to develop citizen science projects which take advantage of the experience, tools and community of the Zooniverse. Successful proposals will receive donated effort of the Adler-based team to build and launch a new citizen science project'.
'can you tell in advance which communities will make use of a forum?' – a great question that drew on various discussions of the role of communities of participants in supporting each other and devising new research questions
a question on 'quality control' provoked a range of responses, from the manual quality control in Transcribe Bentham and the high number of Taggers initially required for each painting in Your Paintings which slowed things down, and lead into a discussion of shallow vs deep interactions
the final questioner asked about documenting film with crowdsourcing and was answered by someone else in the audience, which seemed a very fitting way to close the day.

James Murray in his Scriptorium with thousands of word references sent in by members of the public for the first Oxford English Dictionary. Early crowdsourcing?

If you found this post useful, you might also like Frequently Asked Questions about crowdsourcing in cultural heritage or my earlier Museums and the Web paper on Playing with Difficult Objects – Game Designs to Improve Museum Collections.