Drinking about museums: the Manchester edition, July 10

A few years ago the Museums Computer Group committee started inviting people attending our events to join us for drinks the night before. For locals and people who've travelled up the night before an event, it's a nice way to start to catch up with or meet people who are interested in technology in museums. These days people around the world are organising events under the #drinkingaboutmuseums label, so we thought we'd combine the two and have a #drinkingaboutmuseums in Manchester on Tuesday July 10, 2012. Come join us from 6:30pm at the Sandbar, 120 Grosvenor Street, Manchester M1 7HL.

And of course, the reason we're gathering – on Wednesday July 11, 2012, the MCG (@ukmcg) are running an event with the Digital Learning Network (@DLNet) on 'Engaging digital audiences in museums' in Manchester (tickets possibly still available at http://mcg-dlnet.eventbrite.com/ or follow the hashtag #EngageM on twitter) so we'll have a mixed crowd of museum technologists and educators. You're welcome to attend even if you're not going to the conference.

If you've got any questions, just leave a comment or @-mention me (@mia_out) on twitter. We'll also keep an eye on the #drinkingaboutmuseums tag. You can find out more about #drinkingaboutmuseums in my post about the June New York edition which saw 20-ish museum professionals gather to chat over drinks.

Catch the wind? (Re-post from Polis blog on Spatial Narratives and Deep Maps)

[This post was originally written for the Polis Center's blog.]

Our time at the NEH Institute on Spatial Narratives & Deep Maps is almost at an end.  The past fortnight feels both like it’s flown by and like we’ve been here for ages, which is possibly the right state of mind for thinking about deep maps.  After two weeks of debate deep maps still seem definable only when glimpsed in the periphery and yet not-quite defined when examined directly.  How can we capture the almost-tangible shape of a truly deep map that we can only glimpse through the social constructs, the particular contexts of creation and usage, discipline and the models in current technology?  If deep maps are an attempt to get beyond the use of location-as-index and into space-as-experience, can that currently be done more effectively on a screen or does covering a desk in maps and documents actually allow deeper immersion in a space at a particular time?

We’ve spent the past three days working in teams to prototype different interfaces to deep maps or spatial narratives, and each group presented their interfaces today. It’s been immensely fun and productive and also quite difficult at times.  It’s helped me realise that deep maps and spatial narratives are not dichotomous but exist on a scale – where do you draw the line between curating data sources and presenting an interpreted view of them?  At present, a deep map cannot be a recreation of the world, but it can be a platform for immersive thinking about the intersection of space, time and human lives.  At what point do you move from using a deep map to construct a spatial and temporal argument to using a spatial narrative to present it?

The experience of our (the Broadway team) reinforces Stuart’s point about the importance of the case study.  We uncovered foundational questions whilst deep in the process of constructing interfaces: is a deep map a space for personal exploration, comparison and analysis of sources, or is it a shared vision that is personalised through the process of creating a spatial narrative?  We also attempted to think through how multivocality translates into something on a screen, and how interfaces that can link one article or concept to multiple places might work in reality, and in the process re-discovered that each scholar may have different working methods, but that a clever interface can support multivocality in functionality as well as in content.

Halfway through 'deep maps and spatial narratives' summer institute

I'm a week and a bit into the NEH Institute for Advanced Topics in the Digital Humanities on 'Spatial Narrative and Deep Maps: Explorations in the Spatial Humanities', so this is a (possibly self-indulgent) post to explain why I'm over in Indianapolis and why I only seem to be tweeting with the #PolisNEH hashtag.  We're about to dive into three days of intense prototyping before wrapping things up on Friday, so I'm posting almost as a marker of my thoughts before the process of thinking-through-making makes me re-evaluate our earlier definitions.  Stuart Dunn has also blogged more usefully on Deep maps in Indy.

We spent the first week hearing from the co-directors David Bodenhamer (history, IUPUI), John Corrigan (religious studies, Florida State University), and Trevor Harris (geography, West Virginia University) and guest lecturers Ian Gregory (historical GIS and digital humanities, Lancaster University) and May Yuan (geonarratives, University of Oklahoma), and also from selected speakers at the Digital Cultural Mapping: Transformative Scholarship and Teaching in the Geospatial Humanities at UCLA. We also heard about the other participants projects and backgrounds, and tried to define 'deep maps' and 'spatial narratives'.

It's been pointed out that as we're at the 'bleeding edge', visions for deep mapping are still highly personal. As we don't yet have a shared definition I don't want to misrepresent people's ideas by summarising them, so I'm just posting my current definition of deep maps:

A deep map contains geolocated information from multiple sources that convey their source, contingency and context of creation; it is both integrated and queryable through indexes of time and space.  

Essential characteristics: it can be a product, whether as a snapshot static map or as layers of interpretation with signposts and pre-set interactions and narrative, but is always visibly a process.  It allows open-ended exploration (within the limitations of the data available and the curation processes and research questions behind it) and supports serendipitous discovery of content. It supports curiosity. It supports arguments but allows them to be interrogated through the mapped content. It supports layers of spatial narratives but does not require them. It should be compatible with humanities work: it's citable (e.g. provides URL that shows view used to construct argument) and provides access to its sources, whether as data downloads or citations. It can include different map layers (e.g. historic maps) as well as different data sources. It could be topological as well as cartographic.  It must be usable at different scales:  e.g. in user interface  – when zoomed out provides sense of density of information within; e.g. as space – can deal with different levels of granularity.

Essential functions: it must be queryable and browseable.  It must support large, variable, complex, messy, fuzzy, multi-scalar data. It should be able to include entities such as real and imaginary people and events as well as places within spaces.  It should support both use for presentation of content and analytic use. It should be compelling – people should want to explore other places, times, relationships or sources. It should be intellectually immersive and support 'flow'.

Looking at it now, the first part is probably pretty close to how I would have defined it at the start, but my thinking about what this actually means in terms of specifications is the result of the conversations over the past week and the experience everyone brings from their own research and projects.

For me, this Institute has been a chance to hang out with ace people with similar interests and different backgrounds – it might mean we spend some time trying to negotiate discipline-specific language but it also makes for a richer experience.  It's a chance to work with wonderfully messy humanities data, and to work out how digital tools and interfaces can support ambiguous, subjective, uncertain, imprecise, rich, experiential content alongside the highly structured data GIS systems are good at.  It's also a chance to test these ideas by putting them into practice with a dataset on religion in Indianapolis and learn more about deep maps by trying to build one (albeit in three days).

As part of thinking about what I think a deep map is, I found myself going back to an embarrassingly dated post on ideas for location-linked cultural heritage projects:

I've always been fascinated with the idea of making the invisible and intangible layers of history linked to any one location visible again. Millions of lives, ordinary or notable, have been lived in London (and in your city); imagine waiting at your local bus stop and having access to the countless stories and events that happened around you over the centuries. … The nice thing about local data is that there are lots of people making content; the not nice thing about local data is that it's scattered all over the web, in all kinds of formats with all kinds of 'trustability', from museums/libraries/archives, to local councils to local enthusiasts and the occasional raving lunatic. … Location-linked data isn't only about official cultural heritage data; it could be used to display, preserve and commemorate histories that aren't 'notable' or 'historic' enough for recording officially, whether that's grime pirate radio stations in East London high-rise roofs or the sites of Turkish social clubs that are now new apartment buildings. Museums might not generate that data, but we could look at how it fits with user-generated content and with our collecting policies.

Amusingly, four years ago my obsession with 'open sourcing history' was apparently already well-developed and I was asking questions about authority and trust that eventually informed my PhD – questions I hope we can start to answer as we try to make a deep map.  Fun!

Finally, my thanks to the NEH and the Institute organisers and the support staff at the Polis Center and IUPUI for the opportunity to attend.

Drinking about museums: the New York edition, June 15

Inspired by Koven J. Smith and Kathleen Tinworth's 'Drinking About Museums' in Denver and Ed Rodley's version in Boston, we're drinking about museums (and libraries and archives) in New York this Friday (June 15, 2012), and you're invited!  Since I'm only in NYC for a week and still get confused about whether I'm heading uptown or downtown at any given time, Neal Stimler @nealstimler has kindly taken care of organising things.  If you're interested in coming, let him know so you can grab his contact details and we know to keep an eye out for you.
We're heading to k2 Friday night at the Rubin Museum of Art, 150 W. 17 St., NYC 10011.  We'll be there from 6:30 until closing at 10pm.  The table is booked for Mia Ridge, and we should have enough room that you can just turn up and grab a seat.  It's free entry to the gallery from 6-10:00 p.m and the K2 Lounge serves food.

If you've got any questions, just leave a comment or @-mention me (@mia_out) on twitter.  We'll also keep an eye on the #drinkingaboutmuseums tag.

Well, gosh.

If you see this post it means… I'm on a bus to Heathrow.  I'm on my way to New York for a week's residency at the Cooper-Hewitt  then onto Indianapolis for an NEH Institute for Advanced Topics in the Digital Humanities on 'Spatial Narrative and Deep Maps: Explorations in the Spatial Humanities', and since I'm not sure when I'll next have time to post, I thought I'd leave you with this little provocation:

Museums should stick to what they do best – to preserve, display, study and where possible collect the treasures of civilisation and of nature. They are not fit to do anything else. It is this single rationale for the museum that makes each one unique, which gives each its own distinctive character. It is the hard work of scholars and curators in their own areas of expertise that attracts visitors. Everybody knows that the harder you try to win friends and ingratiate yourself with people, the more repel you them. It would seem however that those running our new museums need to learn afresh this simple human lesson.

Source: Josie Appleton, "Museums for 'The People'?" in 'Museums and their Communities', edited by Sheila Watson (2007).

If that polemic has depressed you too much, you can read this inspiring article instead, 'The wide open future of the art museum: Q&A with William Noel':

We just think that Creative Commons data is real data. It’s data that people can really use. It’s all about access, and access is about several things: licensing and publishing the raw data. Any data that you capture should be available to be the public. … The other important thing is to put the data in places where people can find it… The Walters is a museum that’s free to the public, and to be public these days is to be on the Internet. Therefore to be a public museum your digital data should be free. And the great thing about digital data, particularly of historic collections, is that they’re the greatest advert that these collections have. … The digital data is not a threat to the real data, it’s just an advertisement that only increases the aura of the original…

…people go to the Louvre because they’ve seen the Mona Lisa; the reason people might not be going to an institution is because they don’t know what’s in your institution. Digitization is a way to address that issue, in a way that with medieval manuscripts, it simply wasn’t possible before. People go to museums because they go and see what they already know, so you’ve got to make your collections known. Frankly, you can write about it, but the best thing you can do is to put out free images of it. This is not something you do out of generosity, this is something you do because it makes branding sense, and it even makes business sense. So that’s what’s in it for the institution.

The other main reason to do it is to increase the knowledge of and research on your collection by the people, which has to be part of your mission at least, even in the most conservative of institutions. 

Btw, if you're in New York and fancy meeting up for a coffee before June 17, drop me a line in the comments or @mia_out.  (Or ditto for Indianapolis June 17-30).

Frequently Asked Questions about crowdsourcing in cultural heritage

Over time I've noticed the repetition of various misconceptions and apprehensions about crowdsourcing for cultural heritage and digital history, so since this is a large part of my PhD topic I thought I'd collect various resources together as I work to answer some FAQs. I'll update this post over time in response to changes in the field, my research and comments from readers. While this is partly based on some writing for my PhD, I've tried not to be too academic and where possible I've gone for publicly accessible sources like blog posts rather than send you to a journal paywall.

If you'd rather watch a video than read, check out the Crowdsourcing Consortium for Libraries and Archives (CCLA)'s 'Crowdsourcing 101: Fundamentals and Case Studies' online seminar.

[Last updated: February 2016, to address 'crowdsourcing steals jobs'. Previous updates added a link to CCLA events, crowdsourcing projects to explore and a post on machine learning+crowdsourcing.]

What is crowdsourcing?

Definitions are tricky. Even Jeff Howe, the author of 'Crowdsourcing' has two definitions:

The White Paper Version: Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.

The Soundbyte Version: The application of Open Source principles to fields outside of software.

For many reasons, the term 'crowdsourcing' isn't appropriate for many cultural heritage projects but the term is such neat shorthand that it'll stick until something better comes along. Trevor Owens (@tjowens) has neatly problematised this in The Crowd and The Library:

'Many of the projects that end up falling under the heading of crowdsourcing in libraries, archives and museums have not involved large and massive crowds and they have very little to do with outsourcing labor. … They are about inviting participation from interested and engaged members of the public [and] continue a long standing tradition of volunteerism and involvement of citizens in the creation and continued development of public goods'

Defining crowdsourcing in cultural heritage

To summarise my own thinking and the related literature, I'd define crowdsourcing in cultural heritage as an emerging form of engagement with cultural heritage that contributes towards a shared, significant goal or research area by asking the public to undertake tasks that cannot be done automatically, in an environment where the tasks, goals (or both) provide inherent rewards for participation.

Screenshot from 'Letters of 1916' project.

Who is 'the crowd'?

Good question!  One tension underlying the 'openness' of the call to participate in cultural heritage is the fact that there's often a difference between the theoretical reach of a project (i.e. everybody) and the practical reach, the subset of 'everybody' with access to the materials needed (like a computer and an internet connection), the skills, experience and time…  While 'the crowd' may carry connotations of 'the mob', in 'Digital Curiosities: Resource Creation Via Amateur Digitisation', Melissa Terras (@melissaterras) points out that many 'amateur' content creators are 'extremely self motivated, enthusiastic, and dedicated' and test the boundaries between 'between definitions of amateur and professional, work and hobby, independent and institutional' and quotes Leadbeater and Miller's 'The Pro-Am Revolution' on people who pursue an activity 'as an amateur, mainly for the love of it, but sets a professional standard'.

There's more and more talk of 'community-sourcing' in cultural heritage, and it's a useful distinction but it also masks the fact that nearly all crowdsourcing projects in cultural heritage involve a community rather than a crowd, whether they're the traditional 'enthusiasts' or 'volunteers', citizen historians, engaged audiences, whatever.  That said, Amy Sample Ward has a diagram that's quite useful for planning how to work with different groups. It puts the 'crowd' (people you don't know), 'network' (the community of your community) and 'community' (people with a relationship to your organisation) in different rings based on their closeness to you.

'The crowd' is differentiated not just by their relationship to your organisation, or by their skills and abilities, but their motivation for participating is also important – some people participate in crowdsourcing projects for altruistic reasons, others because doing so furthers their own goals.

I'm worried about about crowdsourcing because…

…isn't letting the public in like that just asking for trouble?

@lottebelice said she'd heard people worry that 'people are highly likely to troll and put in bad data/content/etc on purpose' – but this rarely happens. People worried about this with user-generated content, too, and while kids in galleries delight in leaving rude messages about each other, it's rare online.

It's much more likely that people will mistakenly add bad data, but a good crowdsourcing project should build any necessary data validation into the project. Besides, there are generally much more interesting places to troll than a cultural heritage site.

And as Matt Popke pointed out in a comment, 'When you have thousands of people contributing to an entry you have that many more pairs of eyes watching it. It's like having several hundred editors and fact-checkers. Not all of them are experts, but not all of them have to be. The crowd is effectively self-policing because when someone trolls an entry, somebody else is sure to notice it, and they're just as likely to fix it or report the issue'.  If you're really worried about this, an earlier post on Designing for participatory projects: emergent best practice' has some other tips.

 …doesn't crowdsourcing take advantage of people?

XKCD on the ethics of commercial crowdsourcing

Sadly, yes, some of the activities that are labelled 'crowdsourcing' do. Design competitions that expect lots of people to produce full designs and pay a pittance (if anything) to the winner are rightly hated. (See antispec.com for more and a good list of links).

But in cultural heritage, no. Museums, galleries, libraries, archives and academic projects are in the fortunate position of having interesting work that involves an element of social good, and they also have hugely varied work, from microtasks to co-curated research projects. Crowdsourcing is part of a long tradition of volunteering and altruistic participation, and to quote Owens again, 'Crowdsourcing is a concept that was invented and defined in the business world and it is important that we recast it and think through what changes when we bring it into cultural heritage.'

[Update, May 2013: it turns out museums aren't immune from the dangers of design competitions and spec work: I've written On the trickiness of crowdsourcing competitions to draw some lessons from the Sydney Design competition kerfuffle.]

Anyway, crowdsourcing won't usually work if it's not done right. From A Crowd Without Community – Be Wary of the Mob:

"when you treat a crowd as disposable and anonymous, you prevent them from achieving their maximum ability. Disposable crowds create disposable output. Simply put: crowds need a sense of identity and community to achieve their potential."

…crowdsourcing can't be used for academic work

Reasons given include 'humanists don't like to share their knowledge' with just anyone. And it's possible that they don't, but as projects like Transcribe Bentham and Trove show, academics and other researchers will share the work that helps produce that knowledge. (This is also something I'm examining in my PhD. I'll post some early findings after the Digital Humanities 2012 conference in July).

Looking beyond transcription and other forms of digitisation, it's worth checking out Prism, 'a digital tool for generating crowd-sourced interpretations of texts'.

…it steals jobs

Once upon a time, people starting a career in academia or cultural heritage could get jobs as digitisation assistants, or they could work on a scholarly edition. Sadly, that's not the case now, but that's probably more to do with year upon year of funding cuts. Blame the bankers, not the crowdsourcers.

The good news? Crowdsourcing projects can create jobs – participatory projects need someone to act as community liaison, to write the updates that demonstrate the impact of crowdsourced contributions, to explain the research value of the project, to help people integrate it into teaching, to organise challenges and editathons and more.

What isn't crowdsourcing?

…'the wisdom of the crowds'?

Which is not just another way of saying 'crowd psychology', either (another common furphy). As Wikipedia puts it, 'the wisdom of the crowds' is based on 'diverse collections of independently-deciding individuals'. Handily, Trevor Owens has just written a post addressing the topic: Human Computation and Wisdom of Crowds in Cultural Heritage.

…user-generated content

So what's the difference between crowdsourcing and user-generated content? The lines are blurry, but crowdsourcing is inherently productive – the point is to get a job done, whether that's identifying people or things, creating content or digitising material.

Conversely, the value of user-generated content lies in the act of creating it rather than in the content itself – for example, museums might value the engagement in a visitor thinking about a subject or object and forming a response to it in order to comment on it. Once posted it might be displayed as a comment or counted as a statistic somewhere but usually that's as far as it goes.

And @sherah1918 pointed out, there's a difference between asking for assistance with tasks and asking for feedback or comments: 'A comment book or a blog w/comments isn't crowdsourcing to me … nor is asking ppl to share a story on a web form. That is a diff appr to collecting & saving personal histories, oral histories'.

…other things that aren't crowdsourcing:

[Heading inspired by Sheila Brennan @sherah1918]

  • Crowdfunding (it's often just asking for micro-donations, though it seems that successful crowdfunding projects have a significant public engagement component, which brings them closer to the concerns of cultural heritage organisations. It's also not that new. See Seventeenth-century crowd funding for one example.)
  • Data-mining social media and other content (though I've heard this called 'passive' or 'implict' crowdsourcing)
  • Human computation (though it might be combined with crowdsourcing)
  • Collective intelligence (though it might also be combined with crowdsourcing)
  • General calls for content, help or participation (see 'user-generated content') or vaguely asking people what they think about an idea. Asking for feedback is not crowdsourcing. Asking for help with your homework isn't crowdsourcing, as it only benefits you.
  • Buzzwords applied to marketing online. And as @emmclean said, "I think many (esp mkting) see "crowdsourcing" as they do "viral" – just happens if you throw money at it. NO!!! Must be great idea" – it must make sense as a crowdsourced task.

Ok, so what's different about crowdsourcing in cultural heritage?

For a start, the process is as valuable as the result. Owens has a great post on this, Crowdsourcing Cultural Heritage: The Objectives Are Upside Down, where he says:

'The process of crowdsourcing projects fulfills the mission of digital collections better than the resulting searches… Far better than being an instrument for generating data that we can use to get our collections more used it is actually the single greatest advancement in getting people using and interacting with our collections. … At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory … it is about providing meaningful ways for the public to enhance collections while more deeply engaging and exploring them'.

And as I've said elsewhere, ' playing [crowdsourcing] games with museum objects can create deeper engagement with collections while providing fun experiences for a range of audiences'. (For definitions of 'engagement' see The Culture and Sport Evidence (CASE) programme. (2011). Evidence of what works: evaluated projects to drive up engagement (PDF).)

What about cultural heritage and citizen science?

[This was written in 2012. I've kept it for historical reasons but think differently now.]

First, another definition. As Fiona Romeo writes, 'Citizen science projects use the time, abilities and energies of a distributed community of amateurs to analyse scientific data. In doing so, such projects further both science itself and the public understanding of science'. As Romeo points out in a different post, 'All citizen science projects start with well-defined tasks that answer a real research question', while citizen history projects rarely if ever seem to be based around specific research questions but are aimed more generally at providing data for exploration. Process vs product?

I'm still thinking through the differences between citizen science and citizen history, particularly where they meet in historical projects like Old Weather. Both citizen science and citizen history achieve some sort of engagement with the mindset and work of the equivalent professional occupations, but are the traditional differences between scientific and humanistic enquiry apparent in crowdsourcing projects? Are tools developed for citizen science suitable for citizen history? Does it make a difference that it's easier to take a new interest in history further without a big investment in learning and access to equipment?

I have a feeling that 'citizen science' projects are often more focused on the production of data as accurately and efficiently as possible, and 'citizen history' projects end up being as much about engaging people with the content as it is about content production. But I'm very open to challenges on this…

What kind of cultural heritage stuff can be crowdsourced?

I wrote this list of 'Activity types and data generated' over a year ago for my Masters dissertation on crowdsourcing games for museums and a subsequent paper for Museums and the Web 2011, Playing with Difficult Objects – Game Designs to Improve Museum Collections (which also lists validation types and requirements).  This version should be read in the light of discussion about the difference between crowdsourcing and user-generated content and in the context of things people can do with museums and with games, but it'll do for now:

Activity Data generated
Tagging (e.g. steve.museum, Brooklyn Museum Tag! You're It; variations include two-player 'tag agreement' games like Waisda?, extensions such as guessing games e.g. GWAP ESP Game, Verbosity, Tiltfactor Guess What?; structured tagging/categorisation e.g. GWAP Verbosity, Tiltfactor Cattegory) Tags; folksonomies; multilingual term equivalents; structured tags (e.g. 'looks like', 'is used for', 'is a type of').
Debunking (e.g. flagging content for review and/or researching and providing corrections). Flagged dubious content; corrected data.
Recording a personal story Oral histories; contextualising detail; eyewitness accounts.
Linking (e.g. linking objects with other objects, objects to subject authorities, objects to related media or websites; e.g. MMG Donald). Relationship data; contextualising detail; information on history, workings and use of objects; illustrative examples.
Stating preferences (e.g. choosing between two objects e.g. GWAP Matchin; voting on or 'liking' content). Preference data; subsets of 'highlight' objects; 'interestingness' values for content or objects for different audiences. May also provide information on reason for choice.
Categorising (e.g. applying structured labels to a group of objects, collecting sets of objects or guessing the label for or relationship between presented set of objects). Relationship data; preference data; insight into audience mental models; group labels.
Creative responses (e.g. write an interesting fake history for a known object or purpose of a mystery object.) Relevance; interestingness; ability to act as social object; insight into common misconceptions.

You can also divide crowdsourcing projects into 'macro' and 'micro' tasks – giving people a goal and letting them solve it as they prefer, vs small, well-defined pieces of work, as in the 'Umbrella of Crowdsourcing' at The Daily Crowdsource and there's a fair bit of academic literature on other ways of categorising and describing crowdsourcing.

Using crowdsourcing to manage crowdsourcing

There's also a growing body of literature on ecosystems of crowdsourcing activities, where different tasks and platforms target different stages of the process.  A great example is Brooklyn Museum’s ‘Freeze Tag!’, a game that cleans up data added in their tagging game. An ecosystem of linked activities (or games) can maximise the benefits of a diverse audience by providing a range of activities designed for different types of participant skills, knowledge, experience and motivations; and can encompass different levels of participation from liking, to tagging, finding facts and links.

A participatory ecosystem can also resolve some of the difficulties around validating specialist tags or long-form, more subjective content by circulating content between activities for validation and ranking for correctness, 'interestingness' (etc) by other players (see for example the 'Contributed data lifecycle' diagram on my MW2011 paper or the 'Digital Content Life Cycle' for crowdsourcing in Oomen and Aroyo's paper below). As Nina Simon said in The Participatory Museum, 'By making it easy to create content but impossible to sort or prioritize it, many cultural institutions end up with what they fear most: a jumbled mass of low-quality content'.  Crowdsourcing the improvement of cultural heritage data would also make possible non-crowdsourcing engagement projects that need better content to be viable.

See also Raddick, MJ, and Georgia Bracey. 2009. “Citizen Science: Status and Research Directions for the Coming Decade” on bridging between old and new citizen science projects to aid volunteer retention, and Nov, Oded, Ofer Arazy, and David Anderson. 2011. “Dusting for Science: Motivation and Participation of Digital Citizen Science Volunteers” on creating 'dynamic contribution environments that allow volunteers to start contributing at lower-level granularity tasks, and gradually progress to more demanding tasks and responsibilities'.

What does the future of crowdsourcing hold?

Platforms aimed at bootstrapping projects – that is, getting new projects up and running as quickly and as painlessly as possible – seem to be the next big thing. Designing tasks and interfaces suitable for mobile and tablets will allow even more of us to help out while killing time. There's also a lot of work on the integration of machine learning and human computation; my post 'Helping us fly? Machine learning and crowdsourcing' has more on this.

Find out how crowdsourcing in cultural heritage works by exploring projects

Spend a few minutes with some of the projects listed in Looking for (crowdsourcing) love in all the right places to really understand how and why people participate in cultural heritage crowdsourcing.

Where can I find out more? (AKA, a reading list in disguise)

There's a lot of academic literature on all kinds of aspects of crowdsourcing, but I've gone for sources that are accessible both intellectually and in terms of licensing. If a key reference isn't there, it might be because I can't find a pre-print or whatever outside a paywall – let me know if you know of one!

9781472410221Liked this post? Buy the book! 'Crowdsourcing Our Cultural Heritage' is available through Ashgate or your favourite bookseller…

Thanks, and over to you!

Thanks to everyone who responded to my call for their favourite 'misconceptions and apprehensions about crowdsourcing (esp in history and cultural heritage)', and to those who inspired this post in the first place by asking questions in various places about the negative side of crowdsourcing.  I'll update the post as I hear of more, so let me know your favourites.  I'll also keep adding links and resources as I hear of them.

You might also be interested in: Notes from 'Crowdsourcing in the Arts and Humanities' and various crowdsourcing classes and workshops I've run over the past few years.

Museums and the audience comments paradox

I was at the Imperial War Museum for an advisory board meeting for the Social Interpretation project recently, and had a chance to reflect on my experiences with previous audience participation projects.  As Claire Ross summarised it, the Social Interpretation project is asking: does applying social media models to collections successfully increase engagement and reach?  And what forms of moderation work in that environment – can the audience be trusted to behave appropriately?

One topic for discussion yesterday was whether the museum should do some 'gardening' on the comments.  Participation rates are relatively high but some of the comments are nonsense ('asdf'), repetitive (thousands of variants of 'Cool' or 'sad') or off-topic ('I like the museum') – a pattern probably common to many museum 'have your say' kiosks.  Gardening could involve 'pruning' out comments that were not directly relevant to the question asked in the interactive, or finding ways to surface the interesting comments.  While there are models available in other sectors (e.g. newspapers), I'm excited by the possibility that the Social Interpretation project might have a chance to address this issue for museums.

A big design challenge for high-traffic 'have your say' interactives is providing a quality experience for the audience who is reading comments – they shouldn't have to wade through screens of repeated, vacuous or rude comments to find the gems – while appropriately respecting the contribution and personal engagement of the person who left the comment.

In the spirit of 'have your say', what do you think the solution might be?  What have you tried (successfully or not) in your own projects, or seen working well elsewhere?

Update: the Social Interpretation have posted I iz in ur xhibition trolling ur comments:

"One of the most discussed issues was about what we have termed ‘gardening comments’ but to put it bluntly it’s more a case of should we be ‘curating the visitor voice’ in order to improve the visitor experience? It’s a difficult question to deal with… 

We are at the stage where we really do want to respect the commenter, but also want to give other readers a high value experience. It’s a question of how we do that, and will it significantly change the project?"

If you found this post, you might also be interested in Notes from 'The Shape of Things: New and emerging technology-enabled models of participation through VGC'.

Update, March 2014: I've just been reading a journal article on 'Normative Influences on Thoughtful Online Participation'. The authors set out to test this hypothesis:

'Individuals exposed to highly thoughtful behavior from others will be more thoughtful in their own online comment contributions than individuals exposed to behavior exhibiting a low degree of thoughtfulness.' 

Thoughtful comments were defined by the number of words, how many seconds it took to write them, and how much of the content was relevant to the issue discussed in the original post. And the results? 'We found significant effects of social norm on all three measures related to participants’ commenting behavior. Relative to the low thoughtfulness condition, participants in the high thoughtfulness condition contributed longer comments, spent more time writing them, and presented more issue-relevant thoughts.' To me, this suggests that it's worth finding ways to highlight the more thoughtful comments (and keeping pulling out those 'asdf' weeds) in an interactive as this may encourage other thoughtful comments in turn.

Reference: Sukumaran, Abhay, Stephanie Vezich, Melanie McHugh, and Clifford Nass. “Normative Influences on Thoughtful Online Participation.” In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems, 3401–10. Vancouver, BC, Canada: ACM, 2011. http://dl.acm.org/citation.cfm?id=1979450.

Slow and still dirty Digital Humanities Australasia notes: day 3

These are my very rough notes from day 3 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Quick and dirty Digital Humanities Australasia notes: day 2) held in Canberra's Australian National University at the end of March.

We were welcomed to Day 3 by the ANU's Professor Marnie Hughes-Warrington (who expressed her gratitude for the methodological and social impact of digital humanities work) and Dr Katherine Bode.  The keynote was Dr Julia Flanders on 'Rethinking Collections', AKA 'in praise of collections'… [See also Axel Brun's live blog.]

She started by asking what we mean by a 'collection'? What's the utility of the term? What's the cultural significance of collections? The term speaks of agency, motive, and implies the existence of a collector who creates order through selectivity. Sites like eBay, Flickr, Pinterest are responding to weirdly deep-seated desire to reassert the ways in which things belong together. The term 'collection' implies that a certain kind of completeness may be achieved. Each item is important in itself and also in relation to other items in the collection.

There's a suite of expected activities and interactions in the genre of digital collections, projects, etc. They're deliberate aggregations of materials that bear, demand individual scrutiny. Attention is given to the value of scale (and distant reading) which reinforces the aggregate approach…

She discussed the value of deliberate scope, deliberate shaping of collections, not craving 'everythingness'. There might also be algorithmically gathered collections…

She discussed collections she has to do with – TAPAS, DHQ, Women Writers Online – all using flavours of TEI, the same publishing logic, component stack, providing the same functionality in the service of the same kinds of activities, though they work with different materials for different purposes.

What constitutes a collection? How are curated collections different to user-generated content or just-in-time collections? Back 'then', collections were things you wanted in your house or wanted to see in the same visit. What does the 'now' of collections look like? Decentralisation in collections 'now'… technical requirements are part of the intellectual landscape, part of larger activities of editing and design. A crucial characteristic of collections is variety of philosophical urgency they respond to.

The electronic operates under the sign of limitless storage… potentially boundless inclusiveness. Design logic is a craving for elucidation, more context, the ability for the reader to follow any line of thought they might be having and follow it to the end. Unlimited informational desire, closing in of intellectual constraints. How do boundedness and internal cohesion help define the purpose of a collection? Deliberate attempt at genre not limited by technical limitations. Boundedness helps define and reflect philosophical purpose.

What do we model when we design and build digital collections? We're modelling the agency through which the collection comes into being and is sustained through usage. Design is a collection of representational practices, item selection, item boundaries and contents. There's a homogeneity in the structure, the markup applied to items. Item-to-item interconnections – there's the collection-level 'explicit phenomena' – the directly comparable metadata through which we establish cross-sectional views through the collection (eg by Dublin Core fields) which reveal things we already know about texts – authorship of an item, etc. There's also collection-level 'implicit phenomena' – informational commonalities, patterns that emerge or are revealed through inspection; change shape imperceptibly through how data is modelled or through software used [not sure I got that down right]; they're always motivated so always have a close connection with method.

Readerly knowledge – what can the collection assume about what the reader knows? A table of contents is only useful if you can recognise the thing you want to find in it – they're not always self-evident. How does the collection's modelling affect us as readers? Consider the effects of choices on the intellectual ecology of the collection, including its readers. Readerly knowledge has everything to do with what we think we're doing in digital humanities research.

The Hermeneutics of Screwing Around (pdf). Searching produces a dynamically located just-in-time collection… Search is an annoying guessing game with a passive-aggressive collection. But we prefer to ask a collection to show its hand in a useful way (i. e. browse)… Search -> browse -> explore.

What's the cultural significance of collections? She referenced Liu's Sidney's Technology… A network as flow of information via connection, perpetually ongoing contextualisation; a patchwork is understood as an assemblage, it implies a suturing together of things previously unrelated. A patchwork asserts connections by brute force. A network assumes that connections are there to be discovered, connected to. Patchwork, mosaic – connects pre-existing nodes that are acknowledged to be incommensurable.

We avow the desirability of the network, yet we're aware of the itch of edge cases, data that can't be brought under rule. What do we treat as noise and what as signal, what do we deny is the meaning of the collection? Is exceptionality or conformance to type the most significant case? On twitter, @aylewis summarised this as 'Patchworking metaphor lets us conceptualise non-conformance as signal not noise'

Pay attention to the friction in the system, rather than smoothing it over. Collections both express and support analysis. Expressing theories of genre etc in internal modelling… Patchwork – the collection articulates the scholarly interest that animated its creation but also interests of the reader… The collection is animated by agency, is modelled by it, even while it respects the agency we bring as readers. Scholarly enquiry is always a transaction involving agency on both ends.

My (not very good) notes from discussion afterwards… there was a question about digital femmage; discussion of the tension between the desire for transparency and the desire to permit many viewpoints on material while not disingenuously disavowing the roles in shaping the collection; the trend at one point for factoids rather than narratives (but people wanted the editors' view as a foundation for what they do with that material); the logic of the network – a collection as a set of parameters not as a set of items; Alan Liu's encouragement to continue with theme of human agency in understanding what collections are about (e.g. solo collectors like John Soane); crowdsourced work is important in itself regardless of whether it comes up with the 'best' outcome, by whatever metric. Flanders: 'the commitment to efficiency is worrisome to me, it puts product over people in our scale of moral assessment' [hoorah! IMO, engagement is as important as data in cultural heritage]; a question about the agency of objects, with the answer that digital surrogates are carriers of agency, the question is how to understand that in relation to object agency?

GIS and Mapping I

The first paper was 'Mapping the Past in the Present' by Andrew Wilson, which was a fast run-through some lovely examples based on Sydney's geo-spatial history. He discussed the spatial turn in history, and the mid-20thC shift to broader scales, territories of shared experience, the on-going concern with the description of space, its experience and management.

He referenced Deconstructing the map, Harley, 1989, 'cartography is seldom what the cartographers say it is'. All maps are lies. All maps have to be read, closely or distantly. He referenced Grace Karskens' On the rocks and discussed the reality of maps as evidence, an expression of European expansion; the creation of the maps is an exercise in power. Maps must be interpreted as evidence. He talked about deriving data from historic maps, using regressive analysis to go back in time through the sources. He also mentioned TGIS – time-enabled GIS. Space-time composite model – when have lots and lots of temporal changes, create polygon that describes every change in the sequence.

The second paper was 'Reading the Text, Walking the Terrain, Following the Map: Do We See the Same Landscape?' by Øyvind Eide. He said that viewing a document and seeing a landscape are often represented as similar activities… but seeing a landscape means moving around in it, being an active participant. Wood (2010) on the explosion of maps around 1500 – part of the development of the modern state. We look at older maps through modern eyes – maps weren't made for navigation but to establish the modern state.

He's done a case study on text v maps in Scandinavia, 1740s. What is lost in the process of converting text to maps? Context, vagueness, under-specification, negation, disjunction… It's a combination of too little and too much. Text has information that can't fit on a map and text that doesn't provide enough information to make a map. Under-specification is when a verbal text describes a spatial phenomenon in a way that can be understood in two different ways by a competent reader. How do you map a negative feature of a landscape? i.e. things that are stated not to be there. 'Or' cannot be expressed on a map… Different media, different experiences – each can mediate only certain aspects for total reality (Ellestrom 2010).

The third paper was 'Putting Harlem on the Map' by Stephen Robertson. This article on 'Writing History in the Digital Age' is probably a good reference point: Putting Harlem on the Map, the site is at Digital Harlem. The project sources were police files, newspapers, organisational archives… They were cultural historians, focussed on individual level data, events, what it was like to live in Harlem. It was one of first sites to employ geo-spatial web rather than GIS software. Information was extracted and summarised from primary sources, [but] it wasn't a digitisation project. They presented their own maps and analysis apart from the site to keep it clear for other people to do their work.  After assigning a geo-location it is then possible to compare it with other phenomena from the same space. They used sources that historians typically treat as ephemera such as society or sports pages as well as the news in newspapers.

He showed a great list of event types they've gotten from the data… Legal categories disaggregate crime so it appears more often in the list though was the minority of data. Location types also offers a picture of the community.

Creating visualisations of life in the neighbourhood…. when mapping at this detailed scale they were confronted with how vague most historical sources are and how they're related to other places. 'Historians are satisfied in most cases to say that a place is 'somewhere in Harlem'.' He talked about visualisations as 'asking, but not explaining, why there?'.

I tweeted that I'd gotten a lot more from his demonstration of the site than I had from looking at it unaided in the past, which lead to a discussion with @claudinec and @wragge about whether the 'search vs browse' accessibility issue applies to geospatial interfaces as well as text or images (i.e. what do you need to provide on the first screen to help people get into your data project) and about the need for as many hooks into interfaces as possible, including narratives as interfaces.

Crowdsourcing was raised during the questions at the end of the session, but I've forgotten who I was quoting when I tweeted, 'by marginalising crowdsourcing you're marginalising voices', on the other hand, 'memories are complicated'.  I added my own point of view, 'I think of crowdsourcing as open source history, sometimes that's living memory, sometimes it's research or digitisation'.  If anything, the conference confirmed my view that crowdsourcing in cultural heritage generally involves participating in the same processes as GLAM staff and humanists, and that it shouldn't be exploitative or rely on user experience tricks to get participants (though having made crowdsourcing games for museums, I obviously don't have a problem with making the process easier to participate in).

The final paper I saw was Paul Vetch, 'Beyond the Lowest Common Denominator: Designing Effective Digital Resources'. He discussed the design tensions between: users, audiences (and 'production values'); ubiquity and trends; experimentation (and failure); sustainability (and 'the deliverable'),

In the past digital humanities has compartmentalised groups of users in a way that's convenient but not necessarily valid. But funding pressure to serve wider audiences means anticipating lots of different needs. He said people make value judgements about the quality of a resource according to how it looks.

Ubiquity and trends: understanding what users already use; designing for intuition. Established heuristics for web design turn out to be completely at odds with how users behave.

Funding bodies expect deliverables, this conditions the way they design. It's difficult to combine: experimentation and high production values [something I've posted on before, but as Vetch said, people make value judgements about the quality of a resource according to how it looks so some polish is needed]; experimentation and sustainability…

Who are you designing for? Not the academic you're collaborating with, and it's not to create something that you as a developer would use. They're moving away from user testing at the end of a project to doing it during the project. [Hoorah!]

Ubiquity and trends – challenges include a very highly mediated environment; highly volatile and experimental… Trying to use established user conventions becomes stifling. (He called useit.com 'old nonsense'!) The ludic and experiential are increasingly important elements in how we present our research back.

Mapping Medieval Chester took technology designed for delivering contextual ads and used it to deliver information in context without changing perspective (i.e. without reloading the page, from memory).  The Gough map was an experiment in delivering a large image but also in making people smile.  Experimentation and failure… Online Chopin Variorum Edition was an experiment. How is the 'work' concept challenged by the Chopin sources? Technical methodological/objectives: superimposition; juxtaposition; collation/interpolation…

He discussed coping strategies for the Digital Humanities: accept and embrace the ephemerality of web-based interfaces; focus on process and experience – the underlying content is persistent even if the interfaces don't last.  I think this was a comment from the audience: 'if a digital resource doesn't last then it breaks the principle of citation – where does that leave scholarship?'

Summary

So those are my notes.  For further reference I've put a CSV archive of #DHA2012 tweets from searchhash.com here, but note it's not on Australian time so it needs transposing to match the session times.

This was my first proper big Digital Humanities conference, and I had a great time.  It probably helped that I'm an Australian expat so I knew a sprinkling of people and had a sense of where various institutions fitted in, but the crowd was also generally approachable and friendly.

I was also struck by the repetition of phrases like 'the digital deluge', the 'tsunami of data' – I had the feeling there's a barely managed anxiety about coping with all this data. And if that's how people at a digital humanities conference felt, how must less-digital humanists feel?

I was pleasantly surprised by how much digital history content there was, and even more pleasantly surprised by how many GLAMy people were there, and consequently how much the experience and role of museums, libraries and archives was reflected in the conversations.  This might not have been as obvious if you weren't on twitter – there was a bigger disconnect between the back channel and conversations in the room than I'm used to at museum conferences.

As I mentioned in my day 1 and day 2 posts, I was struck by the statement that 'history is on a different evolutionary branch of digital humanities to literary studies', partly because even though I started my PhD just over a year ago, I've felt the title will be outdated within a few years of graduation.  I can see myself being more comfortable describing my work as 'digital history' in future.

I have to finish by thanking all the speakers, the programme committee, and in particular, Dr Paul Arthur and Dr Katherine Bode, the organisers and the aaDH committee – the whole event went so smoothly you'd never know it was the first one!

And just because I loved this quote, one final tweet from @mikejonesmelb: Sir Ken Robinson: 'Technology is not technology if it was invented before you were born'.

Museum Computer Network 2011 conference notes

Last November I went to the Museum Computer Network (MCN2011) conference for the first time – I was lucky enough to get a scholarship (for which many, many thanks).  The theme was 'hacking the museum: innovation, agility and collaboration' and the conference was packed with interesting sessions.My rough notes are below, though they're probably even sketchier than usual because I had a pretty full conference (running a workshop, taking part in a panel and a debate).  (I thought I'd posted this at the time, but I just found it in draft, so here goes…)

Pre-conference workshop, Wednesday
I ran a half-day workshop on 'Hacking and mash-ups for beginners', which had a great turn-out of people willing to get stuck in.  The basic idea was to give people a first go at scripting 'hello world' and a bit beyond (with JavaScript, because it can be run locally), to provide some insight into thinking computationally (understanding something of programmers think and how ideas might be turned into something on a screen), to play with real museum data and try different visualisation tools to create simple mashups.  My slides and speaker notes are at Hacking and mash-ups for beginners at MCN2011 and I'd be happy to share the exercises on request.  I used lots of cooking/food analogies so have a snack to hand in case the slides make you hungry! I had lots of good feedback from the workshop, but I think my favourite comment was this from Katie Burns (@K8burns): '…I loved the workshop. I nerded out and kept playing with your exercises on my flight home from ATL.'.

Thursday
Kevin Slavin's (@slavin_fpo) thought-provoking keynote took us to Walter Benjamin by way of the Lascaux Caves and onto questions like: what does it do to us [as writers of wall captions and object labels] when objects provide information?.  He observed, 'visitors turn to the caption as if the work of art is a question to be answered' – are we reducing the work to information?  We should be evoking, rather than educating; amplifying rather than answering the question; producing a memory instead of preserving one; making the moment in which you're actually present more precious… Ultimately, the authenticity of his experience [with the artwork in the caves] was in learning how to see it [in the context, the light in which it was created]. Kevin concluded that technology is not about giving additional things to look at, but additional ways to see.

I've posted about the panel discussing 'What's the point of a museum website?' I was in after the keynote at Report from 'What's the point of a museum website'… and Brochureware, aggregators and the messy middle: what's the point of a museum website?.  I also popped into the session 'Valuing Online-only Visitors: Let's Get Serious' which was grappling with many of the issues raised by Culture 24's action research project, How to evaluate success online?.  This all seems to point to a growing momentum for finding new measurable models for value and engagement, possibly including online to on-site conversion, impact, even epiphanies. Interestingly, crowdsourcing is one place where it's relatively easy to place a monetary value on online action – @alastairdunning popped up to say: 'http://www.oucs.ox.ac.uk/ww1lit/ project – 'Normal' digitisation = £40 per item. Crowdsourced = £3.50 per item', adding 'But obviously cultural value of a Wilfred Owen mss is more than your neighbour's WW1 letters and diaries'.

Friday
One of the sessions I was most looking forward to was Online cataloguing tools and strategies, as it covered crowdsourcing, digital scholarly practices and online collections – some of my favourite things!

Digital Mellini turned 17th C Italian manuscript (an inventory of paintings written in rhyming verse) into an online publication and a collaboration tool for scholars. The project asked 'What will digital art history look like?'.  The old way of doing art history was about solo exploration, verbal idea-sharing, physical book publications, unlinked data, image rights issues; but the promise of digital scholarship is: linked data opens new routes to analysis, scholars collaborate online, conversations are captured, digital-only publications count for tenure, no copyright restrictions… I was impressed by their team-based, born-digital approach, even if it's not their norm: 'the process was very non-Getty, it was iterative and agile'.  They had a solid set of requirements included annotations and conversations at the word or letter level of the text, with references to related artworks. They're now tackling 'rules of engagement' for scholars – where to comment, etc – and working out what an online publication looks like and how it affects scholarly practices.

Yale Center for British Art (YCBA) Online Collections's goal was search across all YCBA collections.  All the work they've done is open source – Solr, Lucene – cool!  They're also using LIDO (superceding CDWA and MuseumDat) and looking to linked data including vocabulary harmonisation.  As with many cross-catalogue projects, they ended up using a lowest common denominator between collections and had to compromise on shared fields in search.  I'm not sure who used the lovely phrase 'dedication to public domain'… Both art history presentations mentioned linked data – we've come far!

The final paper was Crowdsourcing transcription: who, why, what and how, with Perian Sully from Balbao Park talking with Ben Brumfield about how they've used his 'From the Page' transcription software.  Transcription is not only useful because you can't do OCR on cursive writing, but it's also a form of engagement and outreach (as I've found with other cultural heritage crowdsourcing).  They covered some similar initiatives like Family Search Indexing, whose goal is to get 175,000 new user volunteering to transcribe records (they've already transcribed close to a billion records) and the Historic Journals project whose goal is to link transcriptions with records in genealogy databases (and lots more examples but these were most relevant to my PhD research).

Reasons for crowd participation (from an ornithology project survey) included the importance of the programme, filling free time, love of nature, civic duty and school requirement.  People participate for a sense of purpose, love of the subject, immersion in the text (deep reading). The question of fun leads into peril of gamification – if you split text line by line to make a microtask-style game, you lose the interesting context.

They gave some tips on how to start a crowdsourced transcription project based on your material and the uses for your transcription.  The design will also affect interpretive decisions made when transcribing – do you try to replicate the line structure on the page? – and can provide incentives like competition to transcribe more materials, though as Perian pointed out, accuracy can be affected by motivation.

I had to leave Philosophical Leadership Needed for the Future: Digital Humanities Scholars in Museums early but it all made a lot more sense to me when I realised Neal wasn't using 'digital humanities' in the sense it's used academically (the application of computational techniques to humanities research questions) – as I see it, he's talking about something much closer to 'digital heritage'.

I still haven't sorted out my notes from History Museums are not Art Museums: Discuss! but it was one of my favourite sessions and a great chance to discuss one of my museumy interests with really smart people.

Saturday
I popped into a bit of THATCamp/CultureHack and had fun playing with an imaginary museum, but unfortunately I didn't get to spend any time in the THATCamp itself, because…

The MCN 'Great Debate'
I was invited to take part in the Great Debate held as the closing plenary session.  I was on the affirmative side with Bruce Wyman, debating 'there are too many museums' against Rob Stein and Roseanna Flouty. For now, I think I'll just say that I think it's the hardest bit of public speaking I've ever done – the trickiness of the question was the least of it!  I think there's a tension between the requirements of the formal debating structure and the desire to dissect the question so you can touch on issues relevant to the audience, so it'll be interesting to see how the format might change in future.

Finally, a silly tweet from me: '#mcn2011 I've decided the perfect visitor-friendly museum is the Mona Lisa on spaceship held by a dinosaur. That you can buy on a t-shirt.' lead to the best thing ever from @timsven: '@mia_out- this pic is for you- museum of the future: trex w/ mona lisa riding millenium falcon #MCN2011 http://t.co/37GdAD1O'.

Museum of the Future

'…and they all turn on their computers and say 'yay!" (aka, 'mapping for humanists')

I'm spending a few hours of my Sunday experimenting with 'mapping for humanists' with an art historian friend, Hannah Williams (@_hannahwill).  We're going to have a go at solving some issues she has encountered when geo-coding addresses in 17th and 18th Century Paris, and we'll post as we go to record the process and hopefully share some useful reflections on what we found as we tried different tools.

We started by working out what issues we wanted to address.  After some discussion we boiled it down to two basic goals: a) to geo-reference historical maps so they can be used to geo-locate addresses and b) to generate maps dynamically from list of addresses. This also means dealing with copyright and licensing issues along the way and thinking about how geospatial tools might fit into the everyday working practices of a historian.  (i.e. while a tool like Google Refine can generate easily generate maps, is it usable for people who are more comfortable with Word than relying on cloud-based services like Google Docs?  And if copyright is a concern, is it as easy to put points on an OpenStreetMap as on a Google Map?)

Like many historians, Hannah's use of maps fell into two main areas: maps as illustrations, and maps as analytic tools.  Maps used for illustrations (e.g. in publications) are ideally copyright-free, or can at least be used as illustrative screenshots.  Interactivity is a lower priority for now as the dataset would be private until the scholarly publication is complete (owing to concerns about the lack of an established etiquette and format for citation and credit for online projects).

Maps used for analysis would ideally support layers of geo-referenced historic maps on top of modern map services, allowing historic addresses to be visually located via contemporaneous maps and geo-located via the link to the modern map.  Hannah has been experimenting with finding location data via old maps of Paris in Hypercities, but manually locating 18th Century streets on historic maps then matching those locations to modern maps is time-consuming and she suspects there are more efficient ways to map old addresses onto modern Paris.

Based on my research interviews with historians and my own experience as a programmer, I'd also like to help humanists generate maps directly from structured data (and ideally to store their data in user-friendly tools so that it's as easy to re-use as it is to create and edit).  I'm not sure if it's possible to do this from existing tools or whether they'd always need an export step, so one of my questions is whether there are easy ways to get records stored in something like Word or Excel into an online tool and create maps from there.  Some other issues historians face in using mapping include: imprecise locations (e.g. street names without house numbers); potential changes in street layouts between historic and modern maps; incomplete datasets; using markers to visually differentiate types of information on maps; and retaining descriptive location data and other contextual information.

Because the challenge is to help the average humanist, I've assumed we should stay away from software that needs to be installed on a server, so to start with we're trying some of the web-based geo-referencing tools listed at http://help.oldmapsonline.org/georeference.

Geo-referencing tools for non-technical people

The first bump in the road was finding maps that are re-usable in technical and licensing terms so that we could link or upload them to the web tools listed at http://help.oldmapsonline.org/georeference.  We've fudged it for now by using a screenshot to try out the tools, but it's not exactly a sustainable solution.  
Hannah's been trying georeferencer.org, Hypercities and Heurist (thanks to Lise Summers ‏@morethangrass on twitter) and has written up her findings at Hacking Historical Maps… or trying to.  Thanks also to Alex Butterworth @AlxButterworth and Joseph Reeves @iknowjoseph for suggestions during the day.

Yahoo! Mapmixer's page was a 404 – I couldn't find any reference to the service being closed, but I also couldn't find a current link for it.

Next I tried Metacarter Labs' Map Rectifier.  Any maps uploaded to this service are publicly visible, though the site says this does 'not grant a copyright license to other users', '[t]here is no expectation of privacy or protection of data', which may be a concern for academics negotiating the line between openness and protecting work-in-progress or anyone dealing with sensitive data.  Many of the historians I've interviewed for my PhD research feel that some sense of control over who can view and use their data is important, though the reasons why and how this is manifested vary.

Screenshot from http://labs.metacarta.com/rectifier/rectify/7192


The site has clear instructions – 'double click on the source map… Double click on the right side to associate that point with the reference map' but the search within the right-hand side 'source map' didn't work and manually navigating to Paris, then the right section of Paris was a huge pain.  Neither of the base maps seemed to have labels, so finding the right location at the right level of zoom was too hard and eventually I gave up.  Maybe the service isn't meant to deal with that level of zoom?  We were using a very small section of map for our trials.

Inspired by Metacarta's Map Rectifier, Map Warper was written with OpenStreetMap in mind, which immediately helps us get closer to the goal of images usable in publications.  Map Warper is also used by the New York Public Library, which described it as a 'tool for digitally aligning ("rectifying") historical maps … to match today's precise maps'.  Map Warper also makes all uploaded maps public: 'By uploading images to the website, you agree that you have permission to do so, and accept that anyone else can potentially view and use them, including changing control points', but also offers 'Map visibility' options 'Public(default)' and 'Don't list the map (only you can see it)'.

Screenshot showing 'warped' historical map overlaid on OpenStreetMap at http://mapwarper.net/

Once a map is uploaded, it zooms to a 'best guess' location, presumably based on the information you provided when uploading the image.  It's a powerful tool, though I suspect it works better with larger images with more room for error.  Some of the functionality is a little obscure to the casual user – for example, the 'Rectify' view tells me '[t]his map either is not currently masked. Do you want to add or edit a mask now?' without explaining what a mask is.  However, I can live with some roughness around the edges because once you've warped your map (i.e. aligned it with a modern map), there's a handy link on the Export tab, 'View KML in Google Maps' that takes you to your map overlaid on a modern map.  Success!

Sadly not all the export options seem to be complete (they weren't working on my map, anyway) so I couldn't work out if there was a non-geek friendly way to open the map in OpenStreetMap.

We have to stop here for now, but at this point we've met one of the goals – to geo-reference historical maps so locations from the past can be found in the present, but the other will have to wait for another day.  (But I'd probably start with openheatmap.com when we tackle it again.  Any other suggestions would be gratefully received!)

(The title quote is something I heard one non-geek friend say to another to explain what geeks get up to at hackdays. We called our experiment a 'hackday' because we were curious to see whether the format of a hackday – working to meet a challenge within set parameters within a short period of time – would work for other types of projects. While this ended up being almost an 'anti-hack', because I didn't want to write code unless we came across a need for a generic tool, the format worked quite well for getting us to concentrate solidly on a small set of problems for an afternoon.)