crowdsourcing – Page 3

The ever-morphing PhD

I wrote this for the NEH/Polis Summer Institute on deep mapping back in June but I'm repurposing it as a quick PhD update as I review my call for interview participants. I'm in the middle of interviews at the moment (and if you're an academic historian working on British history 1600-1900 who might be willing to be interviewed I'd love to hear from you) and after that I'll no doubt be taking stock of the research landscape, the findings from my interviews and project analyses, and updating the shape of my project as we go into the new year. So it doesn't quite reflect where I'm at now, but at the very least it's an insight into the difficulties of research into digital history methodologies when everything is changing so quickly:

"Originally I was going to build a tool to support something like crowdsourced deep mapping through a web application that would let people store and geolocate documents and images they were digitising. The questions that are particularly relevant for this workshop are: what happens when crowdsourcing or citizen history meet deep mapping? Can a deep map created by multiple people for their own research purposes support scholarly work? Can a synthetic, ad hoc collection of information be used to support an argument or would it be just for the discovery of spatio-temporarily relevant material? How would a spatial narrative layer work?

I planned to test this by mapping the lives and intellectual networks of early scientific women. But after conducting a big review of related projects I eventually realised that there's too much similar work going on in the field and that inevitably something similar would have been created by someone with more resources by the time I was writing up. So I had to rethink my question and my methods.

So now my PhD research seeks to answer 'how do academic and family/local historians evaluate, use and contribute to crowdsourced resources, especially geo-located historical materials?', with the goal of providing some insight into the impact of digitality on research practices and scholarship in the humanities. … How do trained and self-taught historians cope with changes in place names and boundaries over time, and the many variations and similarities in place names. Does it matter if you've never been to the place and don't know that it might be that messy and complex?

I'm interested how living in a digital culture affects how researchers work. What does it mean to generate as well as consume digital data in the course of research? How does user-created content affect questions of authorship, authority and trust for amateur historians and scholarly practice? What are the characteristics of a well-designed digital resource, and how can resources and tools for researchers be improved? It's a very Human-Computer Interaction/Infomatics view of the digital humanities but it addresses the issues around discoverability and usability that are so important for people building projects.

I'm currently interviewing academic, family and local historians, focusing on those working on research on people or places in early modern England – very loosely defined, as I'll go 1600-1900. I'm asking them about the tools do they currently use in their research; how they assess new resources; if or when they might you use a resource created through crowdsourcing or user contributions? (e.g. Wikipedia or ancestry.com); how do you work out which online records to trust? How they use place names or geographic locations in your research?

So far I've mostly analysed the interviews for how people think about crowdsourcing, I'll be focusing on the responses to place when I get back.

More generally, I'm interested in the idea of 'chorography 2.0' – what would it look like now? The abundance of information is as much of a problem as an opportunity: how to manage that?"

Frequently Asked Questions about crowdsourcing in cultural heritage

Over time I've noticed the repetition of various misconceptions and apprehensions about crowdsourcing for cultural heritage and digital history, so since this is a large part of my PhD topic I thought I'd collect various resources together as I work to answer some FAQs. I'll update this post over time in response to changes in the field, my research and comments from readers. While this is partly based on some writing for my PhD, I've tried not to be too academic and where possible I've gone for publicly accessible sources like blog posts rather than send you to a journal paywall.

If you'd rather watch a video than read, check out the Crowdsourcing Consortium for Libraries and Archives (CCLA)'s 'Crowdsourcing 101: Fundamentals and Case Studies' online seminar.

[Last updated: February 2016, to address 'crowdsourcing steals jobs'. Previous updates added a link to CCLA events, crowdsourcing projects to explore and a post on machine learning+crowdsourcing.]

What is crowdsourcing?

Definitions are tricky. Even Jeff Howe, the author of 'Crowdsourcing' has two definitions:

The White Paper Version: Crowdsourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.

The Soundbyte Version: The application of Open Source principles to fields outside of software.

For many reasons, the term 'crowdsourcing' isn't appropriate for many cultural heritage projects but the term is such neat shorthand that it'll stick until something better comes along. Trevor Owens (@tjowens) has neatly problematised this in The Crowd and The Library:

'Many of the projects that end up falling under the heading of crowdsourcing in libraries, archives and museums have not involved large and massive crowds and they have very little to do with outsourcing labor. … They are about inviting participation from interested and engaged members of the public [and] continue a long standing tradition of volunteerism and involvement of citizens in the creation and continued development of public goods'

Defining crowdsourcing in cultural heritage

To summarise my own thinking and the related literature, I'd define crowdsourcing in cultural heritage as an emerging form of engagement with cultural heritage that contributes towards a shared, significant goal or research area by asking the public to undertake tasks that cannot be done automatically, in an environment where the tasks, goals (or both) provide inherent rewards for participation.

Screenshot from 'Letters of 1916' project.

Who is 'the crowd'?

Good question! One tension underlying the 'openness' of the call to participate in cultural heritage is the fact that there's often a difference between the theoretical reach of a project (i.e. everybody) and the practical reach, the subset of 'everybody' with access to the materials needed (like a computer and an internet connection), the skills, experience and time… While 'the crowd' may carry connotations of 'the mob', in 'Digital Curiosities: Resource Creation Via Amateur Digitisation', Melissa Terras (@melissaterras) points out that many 'amateur' content creators are 'extremely self motivated, enthusiastic, and dedicated' and test the boundaries between 'between definitions of amateur and professional, work and hobby, independent and institutional' and quotes Leadbeater and Miller's 'The Pro-Am Revolution' on people who pursue an activity 'as an amateur, mainly for the love of it, but sets a professional standard'.

There's more and more talk of 'community-sourcing' in cultural heritage, and it's a useful distinction but it also masks the fact that nearly all crowdsourcing projects in cultural heritage involve a community rather than a crowd, whether they're the traditional 'enthusiasts' or 'volunteers', citizen historians, engaged audiences, whatever. That said, Amy Sample Ward has a diagram that's quite useful for planning how to work with different groups. It puts the 'crowd' (people you don't know), 'network' (the community of your community) and 'community' (people with a relationship to your organisation) in different rings based on their closeness to you.

'The crowd' is differentiated not just by their relationship to your organisation, or by their skills and abilities, but their motivation for participating is also important – some people participate in crowdsourcing projects for altruistic reasons, others because doing so furthers their own goals.

I'm worried about about crowdsourcing because…

…isn't letting the public in like that just asking for trouble?

@lottebelice said she'd heard people worry that 'people are highly likely to troll and put in bad data/content/etc on purpose' – but this rarely happens. People worried about this with user-generated content, too, and while kids in galleries delight in leaving rude messages about each other, it's rare online.

It's much more likely that people will mistakenly add bad data, but a good crowdsourcing project should build any necessary data validation into the project. Besides, there are generally much more interesting places to troll than a cultural heritage site.

And as Matt Popke pointed out in a comment, 'When you have thousands of people contributing to an entry you have that many more pairs of eyes watching it. It's like having several hundred editors and fact-checkers. Not all of them are experts, but not all of them have to be. The crowd is effectively self-policing because when someone trolls an entry, somebody else is sure to notice it, and they're just as likely to fix it or report the issue'. If you're really worried about this, an earlier post on Designing for participatory projects: emergent best practice' has some other tips.

…doesn't crowdsourcing take advantage of people?

XKCD on the ethics of commercial crowdsourcing

Sadly, yes, some of the activities that are labelled 'crowdsourcing' do. Design competitions that expect lots of people to produce full designs and pay a pittance (if anything) to the winner are rightly hated. (See antispec.com for more and a good list of links).

But in cultural heritage, no. Museums, galleries, libraries, archives and academic projects are in the fortunate position of having interesting work that involves an element of social good, and they also have hugely varied work, from microtasks to co-curated research projects. Crowdsourcing is part of a long tradition of volunteering and altruistic participation, and to quote Owens again, 'Crowdsourcing is a concept that was invented and defined in the business world and it is important that we recast it and think through what changes when we bring it into cultural heritage.'

[Update, May 2013: it turns out museums aren't immune from the dangers of design competitions and spec work: I've written On the trickiness of crowdsourcing competitions to draw some lessons from the Sydney Design competition kerfuffle.]

Anyway, crowdsourcing won't usually work if it's not done right. From A Crowd Without Community – Be Wary of the Mob:

"when you treat a crowd as disposable and anonymous, you prevent them from achieving their maximum ability. Disposable crowds create disposable output. Simply put: crowds need a sense of identity and community to achieve their potential."

…crowdsourcing can't be used for academic work

Reasons given include 'humanists don't like to share their knowledge' with just anyone. And it's possible that they don't, but as projects like Transcribe Bentham and Trove show, academics and other researchers will share the work that helps produce that knowledge. (This is also something I'm examining in my PhD. I'll post some early findings after the Digital Humanities 2012 conference in July).

Looking beyond transcription and other forms of digitisation, it's worth checking out Prism, 'a digital tool for generating crowd-sourced interpretations of texts'.

…it steals jobs

Once upon a time, people starting a career in academia or cultural heritage could get jobs as digitisation assistants, or they could work on a scholarly edition. Sadly, that's not the case now, but that's probably more to do with year upon year of funding cuts. Blame the bankers, not the crowdsourcers.

The good news? Crowdsourcing projects can create jobs – participatory projects need someone to act as community liaison, to write the updates that demonstrate the impact of crowdsourced contributions, to explain the research value of the project, to help people integrate it into teaching, to organise challenges and editathons and more.

What isn't crowdsourcing?

…'the wisdom of the crowds'?

Which is not just another way of saying 'crowd psychology', either (another common furphy). As Wikipedia puts it, 'the wisdom of the crowds' is based on 'diverse collections of independently-deciding individuals'. Handily, Trevor Owens has just written a post addressing the topic: Human Computation and Wisdom of Crowds in Cultural Heritage.

…user-generated content

So what's the difference between crowdsourcing and user-generated content? The lines are blurry, but crowdsourcing is inherently productive – the point is to get a job done, whether that's identifying people or things, creating content or digitising material.

Conversely, the value of user-generated content lies in the act of creating it rather than in the content itself – for example, museums might value the engagement in a visitor thinking about a subject or object and forming a response to it in order to comment on it. Once posted it might be displayed as a comment or counted as a statistic somewhere but usually that's as far as it goes.

And @sherah1918 pointed out, there's a difference between asking for assistance with tasks and asking for feedback or comments: 'A comment book or a blog w/comments isn't crowdsourcing to me … nor is asking ppl to share a story on a web form. That is a diff appr to collecting & saving personal histories, oral histories'.

…other things that aren't crowdsourcing:

[Heading inspired by Sheila Brennan @sherah1918]

Crowdfunding (it's often just asking for micro-donations, though it seems that successful crowdfunding projects have a significant public engagement component, which brings them closer to the concerns of cultural heritage organisations. It's also not that new. See Seventeenth-century crowd funding for one example.)
Data-mining social media and other content (though I've heard this called 'passive' or 'implict' crowdsourcing)
Human computation (though it might be combined with crowdsourcing)
Collective intelligence (though it might also be combined with crowdsourcing)
General calls for content, help or participation (see 'user-generated content') or vaguely asking people what they think about an idea. Asking for feedback is not crowdsourcing. Asking for help with your homework isn't crowdsourcing, as it only benefits you.
Buzzwords applied to marketing online. And as @emmclean said, "I think many (esp mkting) see "crowdsourcing" as they do "viral" – just happens if you throw money at it. NO!!! Must be great idea" – it must make sense as a crowdsourced task.

Ok, so what's different about crowdsourcing in cultural heritage?

For a start, the process is as valuable as the result. Owens has a great post on this, Crowdsourcing Cultural Heritage: The Objectives Are Upside Down, where he says:

'The process of crowdsourcing projects fulfills the mission of digital collections better than the resulting searches… Far better than being an instrument for generating data that we can use to get our collections more used it is actually the single greatest advancement in getting people using and interacting with our collections. … At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory … it is about providing meaningful ways for the public to enhance collections while more deeply engaging and exploring them'.

And as I've said elsewhere, ' playing [crowdsourcing] games with museum objects can create deeper engagement with collections while providing fun experiences for a range of audiences'. (For definitions of 'engagement' see The Culture and Sport Evidence (CASE) programme. (2011). Evidence of what works: evaluated projects to drive up engagement (PDF).)

What about cultural heritage and citizen science?

[This was written in 2012. I've kept it for historical reasons but think differently now.]

First, another definition. As Fiona Romeo writes, 'Citizen science projects use the time, abilities and energies of a distributed community of amateurs to analyse scientific data. In doing so, such projects further both science itself and the public understanding of science'. As Romeo points out in a different post, 'All citizen science projects start with well-defined tasks that answer a real research question', while citizen history projects rarely if ever seem to be based around specific research questions but are aimed more generally at providing data for exploration. Process vs product?

I'm still thinking through the differences between citizen science and citizen history, particularly where they meet in historical projects like Old Weather. Both citizen science and citizen history achieve some sort of engagement with the mindset and work of the equivalent professional occupations, but are the traditional differences between scientific and humanistic enquiry apparent in crowdsourcing projects? Are tools developed for citizen science suitable for citizen history? Does it make a difference that it's easier to take a new interest in history further without a big investment in learning and access to equipment?

I have a feeling that 'citizen science' projects are often more focused on the production of data as accurately and efficiently as possible, and 'citizen history' projects end up being as much about engaging people with the content as it is about content production. But I'm very open to challenges on this…

What kind of cultural heritage stuff can be crowdsourced?

I wrote this list of 'Activity types and data generated' over a year ago for my Masters dissertation on crowdsourcing games for museums and a subsequent paper for Museums and the Web 2011, Playing with Difficult Objects – Game Designs to Improve Museum Collections (which also lists validation types and requirements). This version should be read in the light of discussion about the difference between crowdsourcing and user-generated content and in the context of things people can do with museums and with games, but it'll do for now:

Activity	Data generated
Tagging (e.g. steve.museum, Brooklyn Museum Tag! You're It; variations include two-player 'tag agreement' games like Waisda?, extensions such as guessing games e.g. GWAP ESP Game, Verbosity, Tiltfactor Guess What?; structured tagging/categorisation e.g. GWAP Verbosity, Tiltfactor Cattegory)	Tags; folksonomies; multilingual term equivalents; structured tags (e.g. 'looks like', 'is used for', 'is a type of').
Debunking (e.g. flagging content for review and/or researching and providing corrections).	Flagged dubious content; corrected data.
Recording a personal story	Oral histories; contextualising detail; eyewitness accounts.
Linking (e.g. linking objects with other objects, objects to subject authorities, objects to related media or websites; e.g. MMG Donald).	Relationship data; contextualising detail; information on history, workings and use of objects; illustrative examples.
Stating preferences (e.g. choosing between two objects e.g. GWAP Matchin; voting on or 'liking' content).	Preference data; subsets of 'highlight' objects; 'interestingness' values for content or objects for different audiences. May also provide information on reason for choice.
Categorising (e.g. applying structured labels to a group of objects, collecting sets of objects or guessing the label for or relationship between presented set of objects).	Relationship data; preference data; insight into audience mental models; group labels.
Creative responses (e.g. write an interesting fake history for a known object or purpose of a mystery object.)	Relevance; interestingness; ability to act as social object; insight into common misconceptions.

You can also divide crowdsourcing projects into 'macro' and 'micro' tasks – giving people a goal and letting them solve it as they prefer, vs small, well-defined pieces of work, as in the 'Umbrella of Crowdsourcing' at The Daily Crowdsource and there's a fair bit of academic literature on other ways of categorising and describing crowdsourcing.

Using crowdsourcing to manage crowdsourcing

There's also a growing body of literature on ecosystems of crowdsourcing activities, where different tasks and platforms target different stages of the process. A great example is Brooklyn Museum’s ‘Freeze Tag!’, a game that cleans up data added in their tagging game. An ecosystem of linked activities (or games) can maximise the benefits of a diverse audience by providing a range of activities designed for different types of participant skills, knowledge, experience and motivations; and can encompass different levels of participation from liking, to tagging, finding facts and links.

A participatory ecosystem can also resolve some of the difficulties around validating specialist tags or long-form, more subjective content by circulating content between activities for validation and ranking for correctness, 'interestingness' (etc) by other players (see for example the 'Contributed data lifecycle' diagram on my MW2011 paper or the 'Digital Content Life Cycle' for crowdsourcing in Oomen and Aroyo's paper below). As Nina Simon said in The Participatory Museum, 'By making it easy to create content but impossible to sort or prioritize it, many cultural institutions end up with what they fear most: a jumbled mass of low-quality content'. Crowdsourcing the improvement of cultural heritage data would also make possible non-crowdsourcing engagement projects that need better content to be viable.

See also Raddick, MJ, and Georgia Bracey. 2009. “Citizen Science: Status and Research Directions for the Coming Decade” on bridging between old and new citizen science projects to aid volunteer retention, and Nov, Oded, Ofer Arazy, and David Anderson. 2011. “Dusting for Science: Motivation and Participation of Digital Citizen Science Volunteers” on creating 'dynamic contribution environments that allow volunteers to start contributing at lower-level granularity tasks, and gradually progress to more demanding tasks and responsibilities'.

What does the future of crowdsourcing hold?

Platforms aimed at bootstrapping projects – that is, getting new projects up and running as quickly and as painlessly as possible – seem to be the next big thing. Designing tasks and interfaces suitable for mobile and tablets will allow even more of us to help out while killing time. There's also a lot of work on the integration of machine learning and human computation; my post 'Helping us fly? Machine learning and crowdsourcing' has more on this.

Find out how crowdsourcing in cultural heritage works by exploring projects

Spend a few minutes with some of the projects listed in Looking for (crowdsourcing) love in all the right places to really understand how and why people participate in cultural heritage crowdsourcing.

Where can I find out more? (AKA, a reading list in disguise)

Read the original article about crowdsourcing, published in the June, 2006 issue of Wired Magazine or Jeff Howe's book, Crowdsourcing: why the power of the crowd is driving the future of business.
My MW2011 paper, Playing with Difficult Objects – Game Designs to Improve Museum Collections and (deep breath) I've posted my MSc dissertation 'Playing with difficult objects: game designs for crowdsourcing museum metadata' online
From tagging to theorizing: deepening engagement with cultural heritage through crowdsourcing. Curator: The Museum Journal, 56(4) pp. 435–450 (Mia Ridge; link is to university repository version for those who don't have access to Curator)
Workshop activities and notes for 'Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions' (for the British Library's Digital Scholarship programme)
Trevor Owens' blog on 'User Centered Digital History'
Rose Holley's (of Trove fame) blog
@benwbrum's Collaborative Manuscript Transcription blog
Peer-to-peer foundation crowdsourcing page
steve.museum reports – the first museum tagging project
Why Crowdsourcing? Why Scripto?
Clay Shirky, Cognitive Surplus: Creativity and Generosity in a Connected Age
Crowdsourcing in the Cultural Heritage Domain: Opportunities and Challenges, by Johan Oomen and Lora Aroyo (PDF).
Bringing Citizen Scientists and Historians Together
Final report on 'crowd-sourcing in the humanities' of the AHRC Crowd Sourcing Study
Accessible resources on doing crowdsourcing well: try How to Effectively Use Incentives in your Crowdsourcing Project, The Magic of Participation, Building and Maintaining a Vibrant, Creative Crowd.
If you're into mapping, geography or just geospatial work, check out volunteered geographic information (VGI).
Crowdfunding Culture: Namaste, and Welcome to the Smithsonian lists a hit (yoga, a popular topic) and a miss in crowdfunding at the Smithsonian. You might also find CrowdFunding Website Reviews useful.

There's a lot of academic literature on all kinds of aspects of crowdsourcing, but I've gone for sources that are accessible both intellectually and in terms of licensing. If a key reference isn't there, it might be because I can't find a pre-print or whatever outside a paywall – let me know if you know of one!

Liked this post? Buy the book! 'Crowdsourcing Our Cultural Heritage' is available through Ashgate or your favourite bookseller…

Thanks, and over to you!

Thanks to everyone who responded to my call for their favourite 'misconceptions and apprehensions about crowdsourcing (esp in history and cultural heritage)', and to those who inspired this post in the first place by asking questions in various places about the negative side of crowdsourcing. I'll update the post as I hear of more, so let me know your favourites. I'll also keep adding links and resources as I hear of them.

You might also be interested in: Notes from 'Crowdsourcing in the Arts and Humanities' and various crowdsourcing classes and workshops I've run over the past few years.

Slow and still dirty Digital Humanities Australasia notes: day 3

These are my very rough notes from day 3 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Quick and dirty Digital Humanities Australasia notes: day 2) held in Canberra's Australian National University at the end of March.

We were welcomed to Day 3 by the ANU's Professor Marnie Hughes-Warrington (who expressed her gratitude for the methodological and social impact of digital humanities work) and Dr Katherine Bode. The keynote was Dr Julia Flanders on 'Rethinking Collections', AKA 'in praise of collections'… [See also Axel Brun's live blog.]

She started by asking what we mean by a 'collection'? What's the utility of the term? What's the cultural significance of collections? The term speaks of agency, motive, and implies the existence of a collector who creates order through selectivity. Sites like eBay, Flickr, Pinterest are responding to weirdly deep-seated desire to reassert the ways in which things belong together. The term 'collection' implies that a certain kind of completeness may be achieved. Each item is important in itself and also in relation to other items in the collection.

There's a suite of expected activities and interactions in the genre of digital collections, projects, etc. They're deliberate aggregations of materials that bear, demand individual scrutiny. Attention is given to the value of scale (and distant reading) which reinforces the aggregate approach…

She discussed the value of deliberate scope, deliberate shaping of collections, not craving 'everythingness'. There might also be algorithmically gathered collections…

She discussed collections she has to do with – TAPAS, DHQ, Women Writers Online – all using flavours of TEI, the same publishing logic, component stack, providing the same functionality in the service of the same kinds of activities, though they work with different materials for different purposes.

What constitutes a collection? How are curated collections different to user-generated content or just-in-time collections? Back 'then', collections were things you wanted in your house or wanted to see in the same visit. What does the 'now' of collections look like? Decentralisation in collections 'now'… technical requirements are part of the intellectual landscape, part of larger activities of editing and design. A crucial characteristic of collections is variety of philosophical urgency they respond to.

The electronic operates under the sign of limitless storage… potentially boundless inclusiveness. Design logic is a craving for elucidation, more context, the ability for the reader to follow any line of thought they might be having and follow it to the end. Unlimited informational desire, closing in of intellectual constraints. How do boundedness and internal cohesion help define the purpose of a collection? Deliberate attempt at genre not limited by technical limitations. Boundedness helps define and reflect philosophical purpose.

What do we model when we design and build digital collections? We're modelling the agency through which the collection comes into being and is sustained through usage. Design is a collection of representational practices, item selection, item boundaries and contents. There's a homogeneity in the structure, the markup applied to items. Item-to-item interconnections – there's the collection-level 'explicit phenomena' – the directly comparable metadata through which we establish cross-sectional views through the collection (eg by Dublin Core fields) which reveal things we already know about texts – authorship of an item, etc. There's also collection-level 'implicit phenomena' – informational commonalities, patterns that emerge or are revealed through inspection; change shape imperceptibly through how data is modelled or through software used [not sure I got that down right]; they're always motivated so always have a close connection with method.

Readerly knowledge – what can the collection assume about what the reader knows? A table of contents is only useful if you can recognise the thing you want to find in it – they're not always self-evident. How does the collection's modelling affect us as readers? Consider the effects of choices on the intellectual ecology of the collection, including its readers. Readerly knowledge has everything to do with what we think we're doing in digital humanities research.

The Hermeneutics of Screwing Around (pdf). Searching produces a dynamically located just-in-time collection… Search is an annoying guessing game with a passive-aggressive collection. But we prefer to ask a collection to show its hand in a useful way (i. e. browse)… Search -> browse -> explore.

What's the cultural significance of collections? She referenced Liu's Sidney's Technology… A network as flow of information via connection, perpetually ongoing contextualisation; a patchwork is understood as an assemblage, it implies a suturing together of things previously unrelated. A patchwork asserts connections by brute force. A network assumes that connections are there to be discovered, connected to. Patchwork, mosaic – connects pre-existing nodes that are acknowledged to be incommensurable.

We avow the desirability of the network, yet we're aware of the itch of edge cases, data that can't be brought under rule. What do we treat as noise and what as signal, what do we deny is the meaning of the collection? Is exceptionality or conformance to type the most significant case? On twitter, @aylewis summarised this as 'Patchworking metaphor lets us conceptualise non-conformance as signal not noise'

Pay attention to the friction in the system, rather than smoothing it over. Collections both express and support analysis. Expressing theories of genre etc in internal modelling… Patchwork – the collection articulates the scholarly interest that animated its creation but also interests of the reader… The collection is animated by agency, is modelled by it, even while it respects the agency we bring as readers. Scholarly enquiry is always a transaction involving agency on both ends.

My (not very good) notes from discussion afterwards… there was a question about digital femmage; discussion of the tension between the desire for transparency and the desire to permit many viewpoints on material while not disingenuously disavowing the roles in shaping the collection; the trend at one point for factoids rather than narratives (but people wanted the editors' view as a foundation for what they do with that material); the logic of the network – a collection as a set of parameters not as a set of items; Alan Liu's encouragement to continue with theme of human agency in understanding what collections are about (e.g. solo collectors like John Soane); crowdsourced work is important in itself regardless of whether it comes up with the 'best' outcome, by whatever metric. Flanders: 'the commitment to efficiency is worrisome to me, it puts product over people in our scale of moral assessment' [hoorah! IMO, engagement is as important as data in cultural heritage]; a question about the agency of objects, with the answer that digital surrogates are carriers of agency, the question is how to understand that in relation to object agency?

GIS and Mapping I

The first paper was 'Mapping the Past in the Present' by Andrew Wilson, which was a fast run-through some lovely examples based on Sydney's geo-spatial history. He discussed the spatial turn in history, and the mid-20thC shift to broader scales, territories of shared experience, the on-going concern with the description of space, its experience and management.

He referenced Deconstructing the map, Harley, 1989, 'cartography is seldom what the cartographers say it is'. All maps are lies. All maps have to be read, closely or distantly. He referenced Grace Karskens' On the rocks and discussed the reality of maps as evidence, an expression of European expansion; the creation of the maps is an exercise in power. Maps must be interpreted as evidence. He talked about deriving data from historic maps, using regressive analysis to go back in time through the sources. He also mentioned TGIS – time-enabled GIS. Space-time composite model – when have lots and lots of temporal changes, create polygon that describes every change in the sequence.

The second paper was 'Reading the Text, Walking the Terrain, Following the Map: Do We See the Same Landscape?' by Øyvind Eide. He said that viewing a document and seeing a landscape are often represented as similar activities… but seeing a landscape means moving around in it, being an active participant. Wood (2010) on the explosion of maps around 1500 – part of the development of the modern state. We look at older maps through modern eyes – maps weren't made for navigation but to establish the modern state.

He's done a case study on text v maps in Scandinavia, 1740s. What is lost in the process of converting text to maps? Context, vagueness, under-specification, negation, disjunction… It's a combination of too little and too much. Text has information that can't fit on a map and text that doesn't provide enough information to make a map. Under-specification is when a verbal text describes a spatial phenomenon in a way that can be understood in two different ways by a competent reader. How do you map a negative feature of a landscape? i.e. things that are stated not to be there. 'Or' cannot be expressed on a map… Different media, different experiences – each can mediate only certain aspects for total reality (Ellestrom 2010).

The third paper was 'Putting Harlem on the Map' by Stephen Robertson. This article on 'Writing History in the Digital Age' is probably a good reference point: Putting Harlem on the Map, the site is at Digital Harlem. The project sources were police files, newspapers, organisational archives… They were cultural historians, focussed on individual level data, events, what it was like to live in Harlem. It was one of first sites to employ geo-spatial web rather than GIS software. Information was extracted and summarised from primary sources, [but] it wasn't a digitisation project. They presented their own maps and analysis apart from the site to keep it clear for other people to do their work. After assigning a geo-location it is then possible to compare it with other phenomena from the same space. They used sources that historians typically treat as ephemera such as society or sports pages as well as the news in newspapers.

He showed a great list of event types they've gotten from the data… Legal categories disaggregate crime so it appears more often in the list though was the minority of data. Location types also offers a picture of the community.

Creating visualisations of life in the neighbourhood…. when mapping at this detailed scale they were confronted with how vague most historical sources are and how they're related to other places. 'Historians are satisfied in most cases to say that a place is 'somewhere in Harlem'.' He talked about visualisations as 'asking, but not explaining, why there?'.

I tweeted that I'd gotten a lot more from his demonstration of the site than I had from looking at it unaided in the past, which lead to a discussion with @claudinec and @wragge about whether the 'search vs browse' accessibility issue applies to geospatial interfaces as well as text or images (i.e. what do you need to provide on the first screen to help people get into your data project) and about the need for as many hooks into interfaces as possible, including narratives as interfaces.

Crowdsourcing was raised during the questions at the end of the session, but I've forgotten who I was quoting when I tweeted, 'by marginalising crowdsourcing you're marginalising voices', on the other hand, 'memories are complicated'. I added my own point of view, 'I think of crowdsourcing as open source history, sometimes that's living memory, sometimes it's research or digitisation'. If anything, the conference confirmed my view that crowdsourcing in cultural heritage generally involves participating in the same processes as GLAM staff and humanists, and that it shouldn't be exploitative or rely on user experience tricks to get participants (though having made crowdsourcing games for museums, I obviously don't have a problem with making the process easier to participate in).

The final paper I saw was Paul Vetch, 'Beyond the Lowest Common Denominator: Designing Effective Digital Resources'. He discussed the design tensions between: users, audiences (and 'production values'); ubiquity and trends; experimentation (and failure); sustainability (and 'the deliverable'),

In the past digital humanities has compartmentalised groups of users in a way that's convenient but not necessarily valid. But funding pressure to serve wider audiences means anticipating lots of different needs. He said people make value judgements about the quality of a resource according to how it looks.

Ubiquity and trends: understanding what users already use; designing for intuition. Established heuristics for web design turn out to be completely at odds with how users behave.

Funding bodies expect deliverables, this conditions the way they design. It's difficult to combine: experimentation and high production values [something I've posted on before, but as Vetch said, people make value judgements about the quality of a resource according to how it looks so some polish is needed]; experimentation and sustainability…

Who are you designing for? Not the academic you're collaborating with, and it's not to create something that you as a developer would use. They're moving away from user testing at the end of a project to doing it during the project. [Hoorah!]

Ubiquity and trends – challenges include a very highly mediated environment; highly volatile and experimental… Trying to use established user conventions becomes stifling. (He called useit.com 'old nonsense'!) The ludic and experiential are increasingly important elements in how we present our research back.

Mapping Medieval Chester took technology designed for delivering contextual ads and used it to deliver information in context without changing perspective (i.e. without reloading the page, from memory). The Gough map was an experiment in delivering a large image but also in making people smile. Experimentation and failure… Online Chopin Variorum Edition was an experiment. How is the 'work' concept challenged by the Chopin sources? Technical methodological/objectives: superimposition; juxtaposition; collation/interpolation…

He discussed coping strategies for the Digital Humanities: accept and embrace the ephemerality of web-based interfaces; focus on process and experience – the underlying content is persistent even if the interfaces don't last. I think this was a comment from the audience: 'if a digital resource doesn't last then it breaks the principle of citation – where does that leave scholarship?'

Summary

So those are my notes. For further reference I've put a CSV archive of #DHA2012 tweets from searchhash.com here, but note it's not on Australian time so it needs transposing to match the session times.

This was my first proper big Digital Humanities conference, and I had a great time. It probably helped that I'm an Australian expat so I knew a sprinkling of people and had a sense of where various institutions fitted in, but the crowd was also generally approachable and friendly.

I was also struck by the repetition of phrases like 'the digital deluge', the 'tsunami of data' – I had the feeling there's a barely managed anxiety about coping with all this data. And if that's how people at a digital humanities conference felt, how must less-digital humanists feel?

I was pleasantly surprised by how much digital history content there was, and even more pleasantly surprised by how many GLAMy people were there, and consequently how much the experience and role of museums, libraries and archives was reflected in the conversations. This might not have been as obvious if you weren't on twitter – there was a bigger disconnect between the back channel and conversations in the room than I'm used to at museum conferences.

As I mentioned in my day 1 and day 2 posts, I was struck by the statement that 'history is on a different evolutionary branch of digital humanities to literary studies', partly because even though I started my PhD just over a year ago, I've felt the title will be outdated within a few years of graduation. I can see myself being more comfortable describing my work as 'digital history' in future.

I have to finish by thanking all the speakers, the programme committee, and in particular, Dr Paul Arthur and Dr Katherine Bode, the organisers and the aaDH committee – the whole event went so smoothly you'd never know it was the first one!

And just because I loved this quote, one final tweet from @mikejonesmelb: Sir Ken Robinson: 'Technology is not technology if it was invented before you were born'.

Museum Computer Network 2011 conference notes

Last November I went to the Museum Computer Network (MCN2011) conference for the first time – I was lucky enough to get a scholarship (for which many, many thanks). The theme was 'hacking the museum: innovation, agility and collaboration' and the conference was packed with interesting sessions.My rough notes are below, though they're probably even sketchier than usual because I had a pretty full conference (running a workshop, taking part in a panel and a debate). (I thought I'd posted this at the time, but I just found it in draft, so here goes…)

Pre-conference workshop, Wednesday
I ran a half-day workshop on 'Hacking and mash-ups for beginners', which had a great turn-out of people willing to get stuck in. The basic idea was to give people a first go at scripting 'hello world' and a bit beyond (with JavaScript, because it can be run locally), to provide some insight into thinking computationally (understanding something of programmers think and how ideas might be turned into something on a screen), to play with real museum data and try different visualisation tools to create simple mashups. My slides and speaker notes are at Hacking and mash-ups for beginners at MCN2011 and I'd be happy to share the exercises on request. I used lots of cooking/food analogies so have a snack to hand in case the slides make you hungry! I had lots of good feedback from the workshop, but I think my favourite comment was this from Katie Burns (@K8burns): '…I loved the workshop. I nerded out and kept playing with your exercises on my flight home from ATL.'.

Thursday
Kevin Slavin's (@slavin_fpo) thought-provoking keynote took us to Walter Benjamin by way of the Lascaux Caves and onto questions like: what does it do to us [as writers of wall captions and object labels] when objects provide information?. He observed, 'visitors turn to the caption as if the work of art is a question to be answered' – are we reducing the work to information? We should be evoking, rather than educating; amplifying rather than answering the question; producing a memory instead of preserving one; making the moment in which you're actually present more precious… Ultimately, the authenticity of his experience [with the artwork in the caves] was in learning how to see it [in the context, the light in which it was created]. Kevin concluded that technology is not about giving additional things to look at, but additional ways to see.

I've posted about the panel discussing 'What's the point of a museum website?' I was in after the keynote at Report from 'What's the point of a museum website'… and Brochureware, aggregators and the messy middle: what's the point of a museum website?. I also popped into the session 'Valuing Online-only Visitors: Let's Get Serious' which was grappling with many of the issues raised by Culture 24's action research project, How to evaluate success online?. This all seems to point to a growing momentum for finding new measurable models for value and engagement, possibly including online to on-site conversion, impact, even epiphanies. Interestingly, crowdsourcing is one place where it's relatively easy to place a monetary value on online action – @alastairdunning popped up to say: 'http://www.oucs.ox.ac.uk/ww1lit/ project – 'Normal' digitisation = £40 per item. Crowdsourced = £3.50 per item', adding 'But obviously cultural value of a Wilfred Owen mss is more than your neighbour's WW1 letters and diaries'.

Friday
One of the sessions I was most looking forward to was Online cataloguing tools and strategies, as it covered crowdsourcing, digital scholarly practices and online collections – some of my favourite things!

Digital Mellini turned 17th C Italian manuscript (an inventory of paintings written in rhyming verse) into an online publication and a collaboration tool for scholars. The project asked 'What will digital art history look like?'. The old way of doing art history was about solo exploration, verbal idea-sharing, physical book publications, unlinked data, image rights issues; but the promise of digital scholarship is: linked data opens new routes to analysis, scholars collaborate online, conversations are captured, digital-only publications count for tenure, no copyright restrictions… I was impressed by their team-based, born-digital approach, even if it's not their norm: 'the process was very non-Getty, it was iterative and agile'. They had a solid set of requirements included annotations and conversations at the word or letter level of the text, with references to related artworks. They're now tackling 'rules of engagement' for scholars – where to comment, etc – and working out what an online publication looks like and how it affects scholarly practices.

Yale Center for British Art (YCBA) Online Collections's goal was search across all YCBA collections. All the work they've done is open source – Solr, Lucene – cool! They're also using LIDO (superceding CDWA and MuseumDat) and looking to linked data including vocabulary harmonisation. As with many cross-catalogue projects, they ended up using a lowest common denominator between collections and had to compromise on shared fields in search. I'm not sure who used the lovely phrase 'dedication to public domain'… Both art history presentations mentioned linked data – we've come far!

The final paper was Crowdsourcing transcription: who, why, what and how, with Perian Sully from Balbao Park talking with Ben Brumfield about how they've used his 'From the Page' transcription software. Transcription is not only useful because you can't do OCR on cursive writing, but it's also a form of engagement and outreach (as I've found with other cultural heritage crowdsourcing). They covered some similar initiatives like Family Search Indexing, whose goal is to get 175,000 new user volunteering to transcribe records (they've already transcribed close to a billion records) and the Historic Journals project whose goal is to link transcriptions with records in genealogy databases (and lots more examples but these were most relevant to my PhD research).

Reasons for crowd participation (from an ornithology project survey) included the importance of the programme, filling free time, love of nature, civic duty and school requirement. People participate for a sense of purpose, love of the subject, immersion in the text (deep reading). The question of fun leads into peril of gamification – if you split text line by line to make a microtask-style game, you lose the interesting context.

They gave some tips on how to start a crowdsourced transcription project based on your material and the uses for your transcription. The design will also affect interpretive decisions made when transcribing – do you try to replicate the line structure on the page? – and can provide incentives like competition to transcribe more materials, though as Perian pointed out, accuracy can be affected by motivation.

I had to leave Philosophical Leadership Needed for the Future: Digital Humanities Scholars in Museums early but it all made a lot more sense to me when I realised Neal wasn't using 'digital humanities' in the sense it's used academically (the application of computational techniques to humanities research questions) – as I see it, he's talking about something much closer to 'digital heritage'.

I still haven't sorted out my notes from History Museums are not Art Museums: Discuss! but it was one of my favourite sessions and a great chance to discuss one of my museumy interests with really smart people.

Saturday
I popped into a bit of THATCamp/CultureHack and had fun playing with an imaginary museum, but unfortunately I didn't get to spend any time in the THATCamp itself, because…

The MCN 'Great Debate'
I was invited to take part in the Great Debate held as the closing plenary session. I was on the affirmative side with Bruce Wyman, debating 'there are too many museums' against Rob Stein and Roseanna Flouty. For now, I think I'll just say that I think it's the hardest bit of public speaking I've ever done – the trickiness of the question was the least of it! I think there's a tension between the requirements of the formal debating structure and the desire to dissect the question so you can touch on issues relevant to the audience, so it'll be interesting to see how the format might change in future.

Finally, a silly tweet from me: '#mcn2011 I've decided the perfect visitor-friendly museum is the Mona Lisa on spaceship held by a dinosaur. That you can buy on a t-shirt.' lead to the best thing ever from @timsven: '@mia_out- this pic is for you- museum of the future: trex w/ mona lisa riding millenium falcon #MCN2011 http://t.co/37GdAD1O'.

Quick and dirty Digital Humanities Australasia notes: day 1

As always, I should have done this sooner and tidied them up more, but better rough notes than nothing, so here goes… The Australasian Association for Digital Humanities held their inaugural conference in Canberra in March, 2012. You can get an overall sense of the conference from the #DHA2012 tweets (I've put a CSV archive of #DHA2012 tweets from searchhash.com here, but note it's not on Australian time) and from the keynotes.

In his opening keynote on the movements between close and distant reading, Alan Liu observed that the crux of the 'reading' issue depends on the field, and further, that 'history is on a different evolutionary branch of digital humanities to literary studies'. This is something I've been wondering about since finding myself back in digital humanities, and was possibly reflected in the variety of papers in the overall programme. I was generally following sessions on digital history, geospatial themes and crowdsourcing, but there was so much in the programme that you could have followed a literary studies line and had a totally different conference experience.

In the next session I went to a panel on 'Connecting Australia's Cultural Datasets: A Vision for Collaboration' with various people from the new 'Humanities Networked Infrastructure' (HuNI) (more background) presenting. It started with Deb Verhoeven on 'jailbreaking cultural data' and the tension identified by Brand: "information wants to be expensive because it's so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is lower and lower all the time. So you have these two things fighting against each other". 'Information wants to be social': she discussed the need to understand the value of research in terms of community engagement, not just as academically ranked output, and to return research to the communities they're investigating in meaningful ways.

Other statements that resonated were the need for organisational, semantic and technical interoperability in datasets to create collaborative environments. Collaboration requires data integration and exchange as well as dealing with different ideas about what 'data' is in different disciplines in the humanities. Collaboration in the cultural datasets community can follow unmet needs: discover data that's currently hidden, make connections between disparate data sources, publish and share connections.

Ross Harley talked about how interoperability facilitates serendipity and trying to find new ways for data to collide. In the questions, Ingrid Mason asked about parallels with the GLAM (galleries, libraries, archives and museums) community, but it was also pointed out that GLAMs are behind in publishing their data – not everything HuNI wants to use is available yet. I pointed out (on the twitter back channel) that requests for GLAM information from intensive users (e.g. researchers) helps memory institutions make the case for publishing more data – it's still all a bit chicken-or-the-egg.

After lunch I went to the crowdsourcing session (not least cos I was presenting early results from my PhD in it). The first presentation was on 'crowdsourcing semantic tags on 3D museum artefacts' which could have amazing applications for teaching material culture and criticism as well as source communities because it lets people annotate specific locations on a 3D model. Interestingly, during the questions someone reported people visiting campus classics museum who said they were enjoying seeing the objects in person but also wanted access to electronic versions – it's fascinating watching audience expectations change.

The next presentation was on 'Optimising crowdsourcing websites to increase volunteer participation' which was a case study of NYPL's What's on the menu by Donelle McKinley who was using MECLAB/Flint McGlaughlin's Conversion Sequence heuristic (clarity of value proposition, motivation, incentive, friction, anxiety) to assess how the project's design was optimised to motivate audience participation. Donelle's analysis is really useful for people thinking about designing for crowdsourcing, but I'm not sure my notes do it justice, and I'm afraid I didn't get many notes for Pauline Cockrill's 'Using Web 2.0 to make new connections in community history' as I was on just afterwards. One point I tweeted was about a quick win for crowdsourcing in using real-world communities as pointers to successful online collaborations, but I'm not sure now who said it.

One comment I noted during the discussion was "a real pain about Old Weather was that you'd get into working on a ship and it would just sail off on you" – interfaces that work for the organisation doesn't always work for the audience. This session was generally useful for clarifying my thoughts on the tension between optimising for efficiency or engagement in cultural heritage crowdsourcing projects.

In the interests of getting this posted I'll stop here and call this 'day 1'. I'm not sure if any of the slides are available yet, but I'll update and link to any presentations or other write-ups I find. There's a live blog of many sessions at http://snurb.info/taxonomy/term/137.

[Update: I've posted about Day 2 at Quick and dirty Digital Humanities Australasia notes: day 2 and Slow and still dirty Digital Humanities Australasia notes: day 3.]

Geek for a week: residency at the Powerhouse Museum

I've spent the last week as 'geek-in-residence' with the Digital, Social and Emerging Technologies team at the Powerhouse Museum. I wasn't sure what 'geek-in-residence' would mean in reality, but in this case it turned out to be a week of creativity, interesting constraints and rapid, iterative design.

When I arrived on Monday morning, I had no idea what I'd be working on, let alone how it would all work. By the end of the first day I knew how I'd be working, but not exactly what I'd focus on. I came in with fresh questions on Tuesday, and was sketching ideas by lunchtime. The next few days were spent getting stuck into wireframes to focus in on specific issues within that problem space; I turned initial ideas into wireframes and basic copy; and put that through two rounds of quick-and-dirty testing with members of the public and Powerhouse volunteers. By the time I left on Friday I was able to handover wireframes for a site called 'conversations about collections' which aims to record people's memories of items from the collection. (I ran out of time to document the technical aspects of how the site could be built in WordPress, but given the skills of the team I think they'll cope.)

The first day and a half were about finding the right-sized problem. In conversations with Paula (Manager of the Visual & Digitisation services team) and Luke (Web Manager), we discussed what each of us were interested in exploring, looking for the intersection between what was possible in the time and with the material to hand.

After those first conversations, I went back to Powerhouse's strategy document for inspiration. If in doubt, go back to the mission! I was looking for a tie-in with their goals – luckily their plan made it easy to see where things might fit. Their strategy talked about ideas and technology that have changed our world and stories of people who create and inspire them, about being open to 'rich engagement, to new conversations about the collections'.

I also considered what could be supported by the existing API, what kinds of activities worked well with their collections and what could be usefully built and tested as paper or on-screen prototypes. Like many large collections, most of the objects lack the types of data that supports deeper engagement for non-experts (though the significance statements that exist are lovely).

Two threads emerged from the conversations: bringing social media conversations and activity back into the online collections interfaces to help provide an information scent for users of the site; and crowdsourcing games based around enhancing the collections data.
The first was an approach to the difficulties in surfacing the interesting objects in very large collections. Could you create a 'heat map' based on online activity about objects to help searchers and browsers spot objects that might be more interesting?

At one point Nico (Senior Producer) and I had a look at Google Analytics to see what social media sites were sending traffic to the collections and suss out how much data could be gleaned. Collection objects are already showing up on Pinterest, and I had wild thoughts about screen-scraping Pinterest (they have no API) to display related boards on the OPAC search results or object pages…

I also thought about building a crowdsourcing game that would use expert knowledge to data to make better games possible for the general public – this would be an interesting challenge, as open-ended activities are harder to score automatically so you need to design meaningful rewards and ensure an audience to help provide them. However, it was probably a bigger task than I had time for, especially with most of the team already busy on other tasks, though I've been interested in that kind of dual-phased project since my MSc project on crowdsourcing games for museums.

But in the end, I went back to two questions: what information is needed about the collections, what's the best way to get it? We decided to focus on conversations, stories and clues about objects in the collections with a site aimed at collecting 'living memories' about objects by asking people what they remember about an object and how they'd explain it to a kid. The name, 'Conversations about collections' came directly from the strategy doc and was just too neat a description to pass up, though 'memory bank' was another contender.
I ended up with five wireframes (clickable PDF at that link) to cover the main tasks of the site: to persuade people (particularly older people) that their memories are worth sharing, and to get the right object in front of the right person. Explaining more about the designs would be a whole other blog post, but in the interests of getting this post out I'll save that for another day… I'm dashing out this post before I head out, but I'll update in response to questions (and generally things out when I have more time).

My week at the Powerhouse was a brilliant chance to think through the differences between history of science/social history objects and art objects, and between history and art museums, but that's for another post (perhaps when if I ever get around to posting my notes from the MCN session on a similar topic).

It also helped me reflect on my interests, which I would summarise as 'meaningful audience participation' – activities that are engaging and meaningful for the audience and also add value for the museum, activities that actually change the museum in some way (hopefully for the better!), whether that's through crowdsourcing, co-curation or other types of engagement.

Finally, I owe particular thanks to Paula Bray and Luke Dearnley for running with Seb Chan's original suggestion and for their time and contributions to shaping the project; to Nicolaas Earnshaw for wireframe work and Suse Cairns for going out testing on the gallery floor with me; and to Dan Collins, Estee Wah, Geoff Barker and everyone else in the office and on various tours for welcoming me into their space and their conversations.

Photo: behind the scenes at the (then) Powerhouse Museum, Sydney

'Entrepreneurship and Social Media' and 'Collaborating to Compete'

[Update: I hope the presentations from the speakers are posted, as they were all inspiring in their different ways. Bristol City Council's civic crowdsourcing projects had impressive participation rates, and Phil Higgins identified the critical success factors as: choose the right platform, use it at the right stage, issue must be presented clearly. Joanne Orr talked about museum contexts that are encapsulating the intangible including language and practices (and recording intangible cultural heritage in a wiki) and I could sense the audience's excitement about Andrew Ellis' presentation on 'Your Paintings' and the crowdsourcing tagger developed for the Public Catalogue Foundation.]

I'm in Edinburgh for the Museums Galleries Scotland conference 'Collaborating to Compete'. I'm chairing a session on 'Entrepreneurship and Social Media'. In this context, the organisers defined entrepreneurship as 'doing things innovatively and differently', including new and effective ways of working. This session is all about working in partnerships and collaborating with the public. The organisers asked me to talk about my own research as well as introducing the session. I'm posting my notes in advance to save people having to scribble down notes, and I'll try to post back with notes from the session presentations.

Anyway, on with my notes…

Welcome to this session on entrepreneurship and social media. Our speakers are going to share their exciting work with museum collections and cultural heritage. Their projects demonstrate the benefits of community participation, of opening up to encourage external experts to share their knowledge, and of engaging the general public with the task of improving access to cultural heritage for all. The speakers have explored innovative ways of working, including organisational partnerships and low-cost digital platforms like social media. Our speakers will discuss the opportunities and challenges of collaborating with audiences, the issues around authority, identity and trust in user-generated content, and they'll reflect on the challenges of negotiating partnerships with other organisations or with 'the crowd'.

You'll hear about two different approaches to crowdsourcing from Phil Higgins and Andy Ellis, and about how the 'Intangible Cultural Heritage' project helps a diverse range of people collaborate to create knowledge for all.

I'll also briefly discuss my own research into crowdsourcing through games as an example of innovative forms of participation and engagement.

If you're not familiar with the term, crowdsourcing generally means sharing tasks with the public that are traditionally performed in-house.

Until I left to start my PhD, I worked at the Science Museum in London, where I spent a lot of time thinking about how to make the history of science and technology more engaging, and the objects related to it more accessible. This inspired me when I was looking for a dissertation project for my MSc, so I researched and developed 'Museum Metadata Games' to explore how crowdsourcing games could get people to have fun while improving the content around 'difficult' museum objects.

Unfortunately (most) collections sites are not that interesting to the general public. There's a 'semantic gap' between the everyday language of the public and the language of catalogues.

Projects like steve.museum showed crowdsourcing helps, but it can be difficult to get people to participate in large numbers or over a long period of time. Museums can be intimidating, and marketing your project to audiences can be expensive. But what if you made a crowdsourcing interface that made people want to use it, and to tell their friends to use it? Something like… a game?

A lot of people play games… 20 million people in the UK play casual games. And a lot of people play museum games. Games like the Science Museum's Launchball and the Wellcome Collection's High Tea have had millions of plays.

Crowdsourcing games are great at creating engaging experiences. They support low barriers to participation, and the ability to keep people playing. As an example, within one month of launching, DigitalKoot, a game for National Library of Finland, had 25,000 visitors complete over 2 million individual tasks.

Casual game genres include puzzles, card games or trivia games. You've probably heard of Angry Birds and Solitaire, even if you don’t think of yourself as a 'gamer'.

Casual games are perfect for public participation because they're designed for instant gameplay, and can be enjoyed in a few minutes or played for hours.

Easy, feel-good tasks will help people get started. Strong game mechanics, tested throughout development with your target audience, will motivate on-going play and keep people coming back.

Here’s a screenshot of the games I made.

In the tagging game 'Dora's lost data', the player meets Dora, a junior curator who needs their help replacing some lost data. Dora asks the player to add words that would help someone find the object shown in Google.

When audiences can immediately identify an activity as a game – in this the use of characters and a minimal narrative really helped – their usual reservations about contributing content to a museum site disappear.

The brilliant thing about game design is that you can tailor tasks and rewards to your data needs, and build tutorials into gameplay to match the player’s skills and the games’ challenges.

Fun is personal – design for the skills, abilities and motivations of your audience.

People like helping out – show them how their data is used so they can feel good about playing for a few minutes over a cup of tea.

You can make a virtue of the randomness of your content – if people can have fun with 100 historical astronomy objects, they can have fun with anything.

To conclude, crowdsourcing games can be fun and useful for the public and for museums. And now we're going to hear more about working with the public… [the end!]

Quick PhD update from InterFace 2011

It feels like ages since I've posted, so since I've had to put together a 2 minute lightning talk for the Interface 2011 conference at UCL (for people working in the intersection of humanities and technology), I thought I'd post it here as an update. I'm a few months into the PhD but am still very much working out the details of the shape of my project and I expect that how my core questions around crowdsourcing, digitisation, geolocation, researchers and historical materials fit together will change as I get further into my research. [Basically I'm acknowledging that I may look back at this and cringe.]

Notes for 2 minute lightning talk, Interface 2011

'Crowdsourcing the geolocation of historical materials through participant digitisation'

Hi, I'm Mia, I'm working on a PhD in Digital Humanities in the History department at the Open University.

I'm working on issues around crowdsourcing the digitisation and geolocation of historical materials. I'm looking at 'participant digitisation' so I'll be conducting research and building tools to support various types of researchers in digitising, transcribing and geolocating primary and secondary sources.

I'll also create a spatial interface that brings together the digitised content from all participant digitisers. The interface will support the management of sources based on what I've learned about how historians evaluate potential sources.

The overall process has three main stages: research and observation that leads to iterative cycles of designing, building and testing the interfaces, and finally evaluation and analysis on the tools and the impact of geolocated (ad hoc) collections on the practice of historical research.

Notes from a preview of the updated Historypin

The tl;dr version: inspiring project, great enhancements; yay!

Longer version: last night I went to the offices of We Are What We Do for a preview of the new version of HistoryPin. Nick Poole has already written up his notes, so I'm just supplementing them with my own notes from the event (and a bit from conversations with people there and the reading I'd already done for my PhD).

Screenshot with photo near WAWWD office (current site)

Historypin is about bridging the intergenerational divide, about mass participation and access to history, about creating social capital in neighbourhoods, conserving and opening up global archival resources (at this stage that's photographs, not other types of records). There's a focus on events and activities in local communities. [It'd be great to get kids to do quick oral history interviews as they worked with older people, though I think they're doing something like it already.]

New features will include a lovely augmented reality-style view in streetview; the ability to upload and explore video as well as images; a focus on telling stories – 'tours' let you bring a series of photos together into a narrative (the example was 'the arches of New York', most of which don't exist anymore). You can also create 'collections', which will be useful for institutions. They'll also be available in the mobile apps (and yes, I did ask about the possibility of working with the TourML spec for mobile tours).

The mobile apps let you explore your location, explore the map and contribute directly from your phone. You can use the augmented reality view to overlap old photos onto your camera view so that you can take a modern version of an old photo. This means they can crowdsource better modern images than those available in streetview as well as getting indoors shots. This could be a great treasure hunt activity for local communities or tourists. You can also explore collections (as slideshows?) in the app.

They're looking to work with more museums and archives and have been working on a community history project with Reading Museum. Their focus on inclusion is inspiring, and I'll be interested to see how they work to get those images out into the community. While there are quite a few 'then and now' projects focused on geo-locating old images around I think that just shows that it's an accessible way of helping people make connections between their lives and those in the past.

A quick correction to Nick's comments – the Historypin API doesn't exist yet, so if you have ideas for what it should do, it's probably a good time to get in touch. I'll be thinking hard about how it all relates to my PhD, especially if they're making some of the functionality available.

Rockets, Lockets and Sprockets – towards audience models about collections?

This is something I wrote for my MSc dissertation ('Playing with difficult objects: game designs for crowdsourcing museum metadata', view the games I built for it at http://museumgam.es/ or check out the paper (Playing with Difficult Objects – Game Designs to Improve Museum Collections) I wrote for Museums and the Web 2011) about the role of 'distinctiveness' in mental models about collections, that's potentially relevant to discussions around telling stories with and collecting metadata about museum collections. I'm posting it here for reference in the conversation about instances vs classes of objects that arose on the UKMCG list after the release of NMSI (Science Museum, National Media Museum, National Railway Museum) data as CSV. One reason I've been thinking about 'distinctiveness' is because I'm wondering how we help people find the interesting records – the iconic objects, the intriguing stories – in a collection of 240,000 objects.

I'm interested in audiences' mental models about when a record refers to the type of object vs the individual object – my sense is that 'rockets', in the model below, are generally thought of as the individual object, and that 'sprockets' are thought of as the type of object, but that it varies for 'lockets', depending how distinctive they are in relation to the person.

I'm also generally curious about the utility of the model, and would love to know of references that might relate to it (whether supporting or otherwise) – if you can think of any, let me know in the comments.

Not all objects are created equal

Both museum objects and the records about them vary in quality. Just as the physical characteristics of one object – its condition, rarity, etc – differ from another, the strength of its associations with important people, events or concepts will also vary. To complicate things further, as the Collections Council of Australia (2009) states, this 'significance' is 'relative, contingent and dynamic'.

When faced with hundreds of thousands of objects, a museum will digitise and describe objects prioritised by 'technical criteria (physical condition of the original material), content criteria (representativeness, uniqueness), and use criteria (demand)' (Karvonen, 2010). In theory, all objects are registered by the collecting institution, so a basic record exists for each. Hopefully, each has been catalogued and the information transcribed or digitised to some extent, but this is often not the case. Records are often missing descriptions, and most lack the contextual histories that would help the general visitor understand its significance. Some objects may only have an accession number and a one word label, while those on display in a museum generally have well-researched metadata, detailed descriptions and related narratives or contextualised histories. Variable image quality (or lack of images) is an issue in collections in general. This project excludes object records without images but does include many poor-quality images as a result of importing records from a bulk catalogue.

This project posits that objects can be placed on a scale of 'distinctiveness' based on their visual attributes and the amount and quality of information about them. Within this project, bulk collections with minimal metadata and distinctiveness have been labelled 'sprockets', the smaller set of catalogued objects with some distinctiveness have been labelled 'lockets', and the unique, iconic objects with a full contextual history have been labelled 'rockets'. This concept also references the English Heritage 'building grades' model (DCMS, 2010). During the project, the labels 'heroic', 'semi-heroic' and 'bulk' objects were also used.

These labels are not concerned with actual 'significance' or other valuation or priority placed on the object, but relate only to the potential mental models around them and data related to them – the potential for players to discover something interesting about them as objects, or whether they can just tag them on visual characteristics.

In theory there is a correlation between the significance of an object and the amount of information available about it; there may be particular opportunities for games where this is not the case.

Project label

Information type

Amount of information

Proportion of collection

Rockets Subjective Contextual history ('background, events, processes and influences') Tiny minority

Lockets Mostly objective, may be contextual to collection purpose Catalogued (some description) Minority

Sprockets Objective Registered (minimal) Majority

Table 1 Objects grouped by distinctiveness

This can also be represented visually as a pyramid model:

Figure 2 A figurative illustration of the relative numbers of different levels of objects in a typical history museum.

References
Department of Media, Culture and Sport (DCMS) (2010) Principles of Selection for Listing Buildings [Online] Available from: http://www.english-heritage.org.uk/content/imported-docs/p-t/principles-of-selection-for-listing-buildings-2010.pdf

Karvonen, M. (2010). "Digitising Museum Materials – Towards Visibility and Impact". In Pettersson, S., Hagedorn-Saupe, M., Jyrkkiö, T., Weij, A. (Eds) Encouraging Collections Mobility In Europe. Collections Mobility. [Online] Available from: http://www.lending-for-europe.eu/index.php?id=167

Russell, R., and Winkworth, K. (2009). Significance 2.0: a guide to assessing the significance of collections. Collections Council of Australia. [Online] Available from: http://significance.collectionscouncil.com.au/

Project label	Information type	Amount of information	Proportion of collection
Rockets	Subjective	Contextual history ('background, events, processes and influences')	Tiny minority
Lockets	Mostly objective, may be contextual to collection purpose	Catalogued (some description)	Minority
Sprockets	Objective	Registered (minimal)	Majority