My PhD proposal (Provisional title: Participatory digitisation of spatially indexed historical data)

[Update: I'm working on a shorter version with fewer long words. Something like crowdsourcing geolocated historial materials/artefacts with specialist users/academic contributors/citizen historians.]

A few people have asked me about my PhD* topic, and while I was going to wait until I'd started and had a chance to review it in light of the things I'm already starting to learn about what else is going on in the field, I figured I should take advantage of having some pre-written material to cover the gap in blogging while I try to finish various things (like, um, my MSc dissertation) that were hijacked by a broken wrist. So, to keep you entertained in the meantime, here it is.

Please bear in mind that it's already out-of-date in terms of my thinking and sense of what's already happening in the field – I'm really looking forward to diving into it but my plan to spend some time thinking about the project before I started has been derailed by what felt like a year of having an arm in a cast.

* I never got around to posting about this because my disastrous slip on the ice happened just two days after I resigned, but I'm leaving my job at the Science Museum to take up the offer of a full-time PhD in Digital Humanities at the Open University in mid-March.

Provisional title: Participatory digitisation of spatially indexed historical data

This project aims to investigate 'participatory digitisation' models for geo-located historical material.

This project begins with the assumption that researchers are already digitising and geo-locating materials and asks whether it is possible to create systems to capture and share this data. Could the digital records and knowledge generated when researchers access primary materials be captured at the point of creation and published for future re-use? Could the links between materials, and between materials and locations, created when researchers use aggregated or mass-digitised resources, be 'mined' for re-use?

Through the use of a case study based around discovering, collating, transforming and publishing geo-located resources related to early scientific women, the project aims to discover:

  • how geo-located materials are currently used and understood by researchers,
  • what types of tools can be designed to encourage researchers to share records digitised for their own personal use
  • whether tools can be designed to allow non-geospatial specialists to accurately record and discover geo-spatial references
  • the viability of using online geo-coding and text mining services on existing digitised resources

Possible outcomes include an evaluation of spatially-oriented approaches to digital heritage resource discovery and use; mental models of geographical concepts in relation to different types of historical material and research methods; contributions to research on crowdsourcing digital heritage resources (particularly the tensions between competition and co-operation, between the urge to hoard or share resources) and prototype interfaces or applications based on the case study.

The project also provides opportunities to reflect on what it means to generate as well as consume digital data in the course of research, and on the changes digital opportunities have created for the arts and humanities researcher.

** This case study is informed by my thinking around the possibilities of re-populating the landscape with references to the lives, events, objects, etc, held by museums and other cultural heritage institutions, e.g. outside museum walls and by an experimental, collaborative project around 'modern bluestockings', that aimed to locate and re-display the forgotten stories around unconventional and pioneering women in science, technology and academia.

Interview about museum metadata games and a pretty picture

I haven't had a chance to follow up Design constraints and research questions: museum metadata games with a post about the design process for the museum metadata games I've made for my dissertation project (because, stupidly, I slipped on black ice and damaged my wrist), so in the meantime here's a link to an interview Seb Chan did with me for the Fresh+New blog, Interview with Mia Ridge on museum metadata games, and a Wordle of the tags added so far.

There have been nearly 700 turns on the games so far, which have collectively added about 30 facts (Donald’s detective puzzle) and just over 3,700 tags (Dora’s lost data).

Some of the 1,582 unique tags added so far

Design constraints and research questions: museum metadata games

Back in June I posted parts of my dissertation project outline in 'Game mechanics for social good: a case study on interaction models for crowdsourcing museum collections enhancement'. Since then, I've been getting on with researching, designing, building and evaluating museum metadata games (in my copious spare time after work, in a year when we launched three major galleries).

I'm planning to blog bits of my dissertation as I write it up so there'll be more posts over the next month, but for now I wanted to contextualise the two games I'm evaluating at the moment.  In the next post I'll talk about the changes I made after the first solid round of evaluation.

Casual games
The two games, nicknamed 'Dora' and 'Donald' are designed as casual games – something you can pick up and play for five minutes at a time.  Design goals included: an instantly playable game that provides stress relief, supports a competitive spirit (but not necessarily against other people), inherently rewarding experience, simple game play and puts 'fun before do-gooding'.  The games were designed around a specific research-based persona ('Meet Janet', pdf link) – hopefully it's exactly right for some people who are close to the persona in various ways, and quite fun for a wider group.  It won't suit everyone, not least because definitions of 'fun' and expectations around 'games' can be deeply individual.

Design constraints
The games are also designed to test ideas about the types of objects and records that can be used successfully, and the types of content people would be able to contribute about the less charismatic and emotionally accessible reaches of science, technology and social history collections – this means that some of the objects I've used are quite technical, not all the images are great and small variations on object records are repeated (risking 'not another bloody telescope').  While this might match the reality of museum catalogues, would it still allow for a fun game?

The realities of a project I was building in my free time and my lack of graphic design and illustration skills also provided constraints – it had to be browser-based, it couldn't rely on a critical mass of concurrent players to validate actions or content, it had to help the player dive straight into playing and overcome any fears about creating content about museum objects, and it had to use objects ingested through available museum APIs (I selected broad subjects for testing but didn't individually select any objects).

I then added a few extra constraints by deciding to build it as a WordPress plugin – I wanted to take advantage of the CMS-like framework for user logins, navigation and page layout, and I wanted the code I wrote to be usable by others without too much programming overhead.  I'll need to tidy up the code at the end, but once that's done you should be able to install it on any hosted WordPress installation.  I'm making a related plugin to help you populate the database with objects (also part of an experiment in the effectiveness of letting people choose their own subject areas or terms to select playable objects).  I'll talk more about how I worked with those constraints and how they informed the changes I made after evaluation in a later post.

Different games for different purposes
I've been thinking about a museum metadata game typology, which not only considers different types of fun, but also design constraints like:

  • the type and state of the collection (e.g. art works, technical/specialist and social history objects; photographs and other media vs objects; reference collections vs selected highlights; 'tombstone' vs general vs interpretative records)
  • the type of data sought including information curators could add if they had infinite time (detail on the significance of the object, links to other subjects, people, events, objects, collections, etc); information that can be extrapolated from the existing catalogue record; things curators couldn't know (personal history, experiential accounts about the design, manufacture, use, disposal etc of objects); emotional responses; external specialist knowledge; amateur/hobbyist specialist knowledge; synonyms in every day language; terms in other spoken languages

I've also been playing with the idea of linking different game types to different 'life stages' of museum collection metadata.  For example, some games could help a museum work out which of its catalogued items seem more interesting to the public, others help gather tags, create links between items or encourage players to research objects and record new information or links about them, and others still could work well for validating data created in earlier games.  The data I gather through evaluating the games I've designed will help test this model.

So, all that said, if you'd like to play (and help with my evaluation), the two games are:

Donald's detective puzzle – find a fact about an object
Dora's lost data – a simpler tagging game

Notes on 'User Generated Content' session, Open Culture Conference 2010

My notes from the 'user generated content' parallel track on first day of the Open Culture 2010 conference. The session started with brief presentations by panellists, then group discussions at various tables on questions suggested by the organisers. These notes are quite rough, and of course any mistakes are mine. I haven't had a chance to look for the speakers' slides yet so inevitably some bits are missing, and I can only report the discussion at the table I was at in the break-out session. I've also blogged my notes from the plenary session of the Open Culture 2010 conference.

User-generated content session, Open Culture, Europeana – the benefits and challenges of UGC.
Kevin Sumption, User-generated content, a MUST DO for cultural institutions
His background – originally a curator of computer sciences. One of first projects he worked on at Powerhouse was D*Hub which presented design collections from V&A, Brooklyn Museum and Powerhouse Museum – it was for curators but also for general public with an interest in design. Been the source of innovation. Editorial crowd-sourcing approach and social tagging, about 8 years ago.

Two years ago he moved to National Maritime Museum, Royal Observatory, Greenwich. One of the first things they did was get involved with Flickr Commons – get historic photographs into public domain, get people involved in tagging. c1000 records in there. General public have been able to identify some images as Adam Villiers images – specialists help provide attribution for the photographer. Only for tens of records of the 000s but was a good introduction to power of UGC.

Building hybrid exhibition experiences – astronomy photographer of the year – competition on Flickr with real world exhibition for the winners of the competition. 'Blog' with 2000 amateur astronomers, 50 posts a day. Through power of Flickr has become a significant competition and brand in two years.

Joined citizen science consortia. Galaxy Zoo. Brainchild of Oxford – getting public engaged with real science online. Solar Stormwatch c 3000 people analysing and using the data. Many people who get involved gave up science in high school… but people are getting re-engaged with science *and* making meaningful contributions.

Old Weather – helping solve real-world problems with crowdsourcing. Launched two months ago.
Passion for UGC is based around where projects can join very carefully considered consortia, bringing historical datasets with real scientific problems. Can bring large interested public to the project. Many of the public are reconnecting with historical subject matter or sciences.

Judith Bensa-Moortgat, Nationaal Archief, Netherlands, Images for the Future project
Photo collection of more than 1 million photos. Images for the future project aims to save audio-visual heritage through digitisation and conservation of 1.2 million photos.

Once digitised, they optimise by adding metadata and context. Have own documentalists who can add metadata, but it would take years to go through it all. So decided to try using online community to help enrich photo collections. Using existing platforms like Wikipedia, Flickr, Open Street map, they aim to retrieve contextual info generated by the communities.  They donated political portraits to Wikimedia Commons and within three weeks more than half had been linked to relevant articles.

Their experiences with Flickr Commons – they joined in 2008. Main goal was to see if community would enrich their photos with comments and tags. In two weeks, they had 400,000 page views for 400 photos, including peaks when on Dutch TV news. In six months, they had 800 photos with over 1 million views. In Oct 2010, they are averaging 100,000 page views a month; 3 million overall.

But what about comments etc? Divided them into categories of comments [with percentage of overall contributions]:

  • factual info about location, period, people 5%; 
  • link to other sources eg Wikipedia 5%; 
  • personal stories/memories (e.g. someone in image was recognised); 
  • moral discussions; 
  • aesthetical discussions; 
  • translations.

The first two are most important for them.
13,000 tags in many languages (unique tags or total?).
10% of the contributed UGC was useful for contextualisation; tags ensure accessibility [discoverability?] on the web; increased (international) visibility. [Obviously the figures will vary for different projects, depending on what the original intent of the project was]

The issues she'd like to discuss are – copyright, moderation, platforms, community.

Mette Bom, 1001 Stories about Denmark
Story of the day is one of the 1001 stories. It's a website about the history and culture of Denmark. The stories have themes, are connected to a timeline.  Started with 50 themes, 180 expert writers writing the 1001 stories, now it's up to the public to comment and write their own stories. Broad definition of what heritage is – from oldest settlement to the 'porn street' – they wanted to expand the definition of heritage.

Target audiences – tourists going to those places; local dedicated experts who have knowledge to contribute. Wanted to take Danish heritage out of museums.

They've created the main website, mobile apps, widget for other sites, web service.  Launched in May 2010.  20,000 monthly users. 147 new places added, 1500 pictures added.

Main challenges – how to keep users coming back? 85% new, 15% repeat visitors (ok as aimed at tourists but would like more comments). How to keep press interested and get media coverage? Had a good buzz at the start cos of the celebrities. How to define participation? Is it enough to just be a visitor?

Johan Oomen, Netherlands Institute for Sound and Vision, Vrij Uni Amsterdam. Participatory Heritage: the case of the Waisda? video labelling game.
They're using game mechanisms to get people to help them catalogue content. [sounds familiar!]
'In the end, the crowd still rules'.
. Tagging is a good way to facilitate time-based annotation [i.e. tag what's on the screen at different times]

Goal of game is consensus between players. Best example in heritage is steve.museum; much of the thinking about using tagging as a game came from Games with a Purpose (gwap.com).  Basic rule – players score points when their tag exactly matches the tag entered by another within 10 seconds. Other scoring mechanisms.  Lots of channels with images continuously playing.

Linking it to twitter – shout out to friends to come join them playing.  Generating traffic – one of the main challenges. Altruistic message 'help the archive' 'improve access to collections' came out of research with users on messages that worked. Worked with existing communities.

Results, first six months – 44,362 pageviews. 340,000 tags to 604 items, 42,068 unique tags.
Matches – 42% of tags entered more than 2 times. Also looked at vocab (GTAA, Cornetto), 1/3 words were valid Dutch words, but only a few part of thesauruses.  Tags evaluated by documentalists. Documentary film 85% – tags were useful; for reality series (with less semantic density) tags less useful.

Now looking at how to present tags on the catalogue Powerhouse Museum style.  Experimenting with visualising terms, tag clouds when terms represented, also makes it easy to navigate within the video – would have been difficult to do with professional metadata.  Looking at 'tag gardening' – invite people to go back to their tags and click to confirm – e.g. show images with particular tags, get more points for doing it.

Future work – tag matching – synonyms and more specific terms – will get more points for more specific terms.

Panel overview by Costis Dallas, research fellow at Athena, assistant professor at Panteion University, Athens.
He wants to add a different dimension – user-generated content as it becomes an object for memory organisations. New body of resources emerging through these communication practices.
Also, we don't have a historiography anymore; memory resides in personal information devices.  Mashups, changes in information forms, complex composed information on social networks – these raise new problems for collecting – structural, legal, preservation in context, layered composition.  What do we need to do now in order to be able to make use of digital technologies in appropriate, meaningful ways in the future? New kinds of content, participatory curation are challenges for preservation.

Group discussion (breakout tables)
Discussion about how to attract users. [It wasn't defined whether it was how to attract specifically users who'll contribute content or just generally grow the audience and therefore grow the number of content creators within the usual proportions of levels of participation e.g. Nielsen, Forrester; I would also have liked to discussed how to encourage particular kinds of contributions, or to build architectures of participation that provided positive feedback to encourage deeper levels of participation.]

Discussion and conclusions included – go with the strengths of your collections e.g. if one particular audience or content-attracting theme emerges, go with it.  Norway has a national portal where people can add content. They held lots of workshops for possible content creators; made contact with specialist organisations [from which you can take the lesson that UGC doesn't happen in a vacuum, and that it helps to invest time and resources into enabling participants and soliciting content].  Recording living history.  Physical presence in gallery, at events, is important.  Go where audiences already are; use existing platforms.

Discussion about moderation included – once you have comments, how are they integrated back into collections and digital asset management systems?  What do you do about incorrect UGC displayed on a page?  Not an issue if you separate UGC from museum/authoritative content in the interface design.  In the discussion it turned out that Europeana doesn't have a definition of 'moderation'.  IMO, it should include community management, including acknowledging and thanking people for contributions (or rather, moderation is a subset of community management).  It also includes approving or reviewing and publishing content, dealing with corrections suggested by contributors, dealing with incorrect or offensive UGC, adding improved metadata back to collections repositories.

User-generated content and trust – British Library apparently has 'trusted communities' on their audio content – academic communities (by domain name?) and 'everyone else'.  Let other people report content to help weed out bad content.

Then we got onto a really interesting discussion of which country or culture's version of 'offensive' would be used in moderating content.  Having worked in the UK and the Netherlands, I know that what's considered a really rude swear word and what's common vocabulary is quite different in each country… but would there be any content left if you considered the lowest common standards for each country?  [Though thinking about it later, people manage to watch films and TV and popular music from other countries so I guess they can deal with different standards when it's in context.]  To take an extreme content example, a Nazi uniform as memorabilia is illegal in Germany (IIRC) but in the UK it's a fancy dress outfit for a member of the royal family.

Panel reporting back from various table discussions
Kevin's report – discussion varied but similar themes across the two tables. One – focus on the call to action, why should people participate, what's the motivation? How to encourage people to participate? Competitions suggested as one solution, media interest (especially sustained). Notion of core group who'll energise others. Small groups of highly motivated individuals and groups who can act as catalysts [how to recruit, reward, retain]. Use social media to help launch project.

1001 Danish Stories promotional video effectively showed how easy the process of contributing content was,  and that it doesn't have to to be perfect (the video includes celebrities working the camera [and also being a bit daggy, which I later realised was quite powerful – they weren't cool and aloof]).
Giving users something back – it's not a one-way process. Recognition is important. Immediacy too – if participating in a project, people want to see their contributions acknowledged quickly. Long approval processes lose people.
Removal of content – when different social, political backgrounds with different notions of censorship.

Mette's report – how to get users to contribute – answers mostly to take away the boundaries, give the users more credit than we otherwise tend to. We always think users will mess things up and experts will be embarrassed by user content but not the case. In 1001 they had experts correcting other experts. Trust users more, involve experts, ask users what they want. Show you appreciate users, have a dialouge, create community. Make it a part of life and environment of users. Find out who your users are.

Second group – how Europeana can use the content provided in all its forms. Could build web services to present content from different places, linking between different applications.
How to set up goals for user activity – didn't get a lot of answers but one possibility is to start and see how users contribute as you go along. [I also think you shouldn't be experimenting with UGC without some goal in mind – how else will you know if your experiment succeeded?  It also focusses your interaction and interface design and gives the user some parameters (much more useful than an intimidating blank page)].

Judith's report (including our table) – motivation and moderation in relation to Europeana – challenging as Europeana are not the owners of the material; also dealing with multilingual collections. Culturally-specific offensive comments. Definition and expectations of Europeana moderation. Resources need if Europeana does the moderation.
Incentives for moderation – improving data, idealism, helping with translations – people like to help translate.

Johan's report – rewards are important – place users in social charts or give them a feeling of contributing to larger thing; tap into existing community; translate physical world into digital analogue.
Institutional policy – need a clear strategy for e.g. how to integrate the knowledge into the catalogue. Provide training for staff on working with users and online tools. There's value in employing community managers to give people feedback when they leave content.
Using Amazon's Mechanical Turk for annotations…
Doing the projects isn't only of benefit in enriching metadata but also for giving insight into users – discover audiences with particular interests.

Costis commenting – if Europeana only has thumbnails and metadata, is it a missed opportunity to get UGC on more detailed content?

Is Europeana highbrow compared to other platforms like Flickr, FB, so would people be afraid to contribute? [probably – there must be design patterns for encouraging participation from audiences on museum sites, but we're still figuring out what they are]
Business model for crowdsourcing – producing multilingual resources is perfect case for Europeana.

Open to the floor for questions… Importance of local communities, getting out there, using libraries to train people. Local newspapers, connecting to existing communities.

Notes from Europeana's Open Culture Conference 2010

The Open Culture 2010 conference was held in Amsterdam on October 14 – 15. These are my notes from the first day (I couldn't stay for the second day). As always, they're a bit rough, and any mistakes are mine. I haven't had a chance to look for the speakers' slides yet so inevitably some bits are missing.  If you're in a hurry, the quote of the day was from Ian Davis: "the goal is not to build a web of data. The goal is to enrich lives through access to information".

The morning was MCd by Costis Dallas and there was a welcome and introduction from the chair of the Europeana Foundation before Jill Cousins (Europeana Foundation) provided an overview of Europeana. I'm sure the figures will be available online, but in summary, they've made good progress in getting from a prototype in 2008 to an operational service in 2010. [Though I have written down that they had 1 million visits in 2010, which is a lot less than a lot of the national museums in the UK though obviously they've had longer to establish a brand and a large percentage of their stats are probably in the 'visit us' areas rather than collections areas.]

Europeana is a super-aggregator, but doesn't show the role of the national or thematic aggregators or portals as providers/collections of content. They're looking to get away from a one-way model to the point where they can get data back out into different places (via APIs etc). They want to move away from being a single destination site to putting information where the user is, to continue their work on advocacy, open source code etc.

Jill discussed various trends, including the idea of an increased understanding that access to culture is the foundation for a creative economy. She mentioned a Kenneth Gilbraith [?] quote on spending more on culture in recession as that's where creative solutions come from [does anyone know the reference?]. Also, in a time of Increasing nationationalism, Europeana provided an example to combat it with example of trans-Euro cooperation and culture. Finally, customer needs are changing as visitors move from passive recipients to active participants in online culture.

Europeana [or the talk?] will follow four paths – aggregration, distribution, facilitation, engagement.

  • Aggregation – build the trusted source for European digital cultural material. Source curated content, linked data, data enrichment, multilinguality, persistent identifiers. 13 million objects but 18-20thC dominance; only 2% of material is audio-visual [?]. Looking towards publishing metadata as linked open data, to make Europeana and cultural heritage work on the web, e.g. of tagging content with controlled vocabularies – Vikings as tagged by Irish and Norwegian people – from 'pillagers' to 'loving fathers'. They can map between these vocabularies with linked data.
  • Distribution – make the material available to the user wherever they are, whenever they want it. Portals, APIs, widgets, partnerships, getting information into existing school systems.
  • Facilitate innovation in cultural heritage. Knowledge sharing (linked data), IPR business models, policy – advocacy and public domain, data provider agreements. If you write code based on their open sourced applications, they'd love you to commit any code back into Europeana. Also, look at Europeana labs.
  • Engagement – create dialogue and participation. [These slides went quickly, I couldn't keep up]. Examples of the Great War Archive into Europe [?]. Showing the European connection – Art Nouveau works across Europe.

The next talk was Liam Wyatt on 'Peace love and metadata', based in part on his experience at the British Museum, where he volunteered for a month to coordinate the relationship between Wikipedia as representative of the open web [might have mistyped that, it seems quite a mantle to claim] and the BM as representatiave of [missed it]. The goal was to build a proactive relationship of mutual benefit without requiring change in policies or practices of either. [A nice bit of realism because IMO both sides of the museum/Wikipedia relationship are resistant to change and attached firmly to parts of their current models that are in conflict with the other conglomeration.]

The project resulted in 100 new Wikipedia articles, mostly based on the BM/BBC A History of the World in 100 Objects project (AHOW). [Would love to know how many articles were improved as a result too]. They also ran a 'backstage pass' day where Wikipedians come on site, meet with curators, backstage tour, then they sit down and create/update entries. There were also one-on-one collaborators – hooking up Wikipedians and curators/museums with e.g. photos of objects requested.

It's all about improving content, focussing on personal relationshiips, leveraging the communities; it didn't focus on residents (his own work), none of them are content donation projects, every institution has different needs but can do some version of this.

[I'm curious about why it's about bringing Wikipedians into museums and not turning museum people into Wikipedians but I guess that's a whole different project and may be result from the personal relationships anyway.]

Unknown risks are accounted for and overestimated. Unknown rewards are not accounted for and underestimated. [Quoted for truth, and I think this struck a chord with the audience.]

Reasons he's heard for restricting digital access… Most common 'preserving the integrity of the collection' but sounds like need to approve content so can approve of usages. As a result he's seen convoluted copyright claims – it's easy tool to use to retain control.

Derivative works. Commercial use. Different types of free – freedom to use, freedom to study and apply knowledge gained; freedom to make and redistribute copies; [something else].

There are only three applicable licences for Wikipedia. Wikipedia is a non-commercial organisation, but don't accept any non-commercially licenced content as 'it would restrict the freedom of people downstream to re-use the content in innovative ways'. [but this rules out much museum content, whether rightly or not, and with varying sources from legal requirements to preference. Licence wars (see the open source movement) are boring, but the public would have access to more museum content on Wikipedia if that restriction was negotiable. Whether that would outweight the possible 'downstream' benefit is an interesting question.]

Liam asked the audience, do you have a volunteer project in your institution? do you have an e-volunteer program? Well, you do already, you just don't know it. It's a matter of whether you want to engage with them back. You don't have to, and it might be messy.

Wikipedia is not a social network. It is a social construction – it requires a community to exist but socialising is not the goal. Wikipedia is not user generated content. Wikipedia is community curated works. Curated, not only generated. Things can be edited or deleted as well as added [which is always a difficulty for museums thinking about relying on Wikipedia content in the long term, especially as the 'significance' of various objects can be a contested issue.]

Happy datasets are all alike; every unhappy dataset is unhappy in its own way. A good test of data is that it works well with others – technically or legally.

According to Liam, Europeana is the 21st century of the gallery painting – it's a thumbnail gallery but it could be so much more if the content was technically and legally able to be re-used, integrated.
Data already has enough restrictions already e.g. copyright, donor restrictions. but if it comes without restrictions, its a shame to add them. 'Leave the gate as you found it'.

'We're doing the same thing for the same reason for the same people in the same medium, let's do it together.'

The next sessions were 'tasters' of the three thematic tracks of the second part of the day – linked data, user-generated content, and risks and rewards. This was a great idea because I felt like I wasn't totally missing out on the other sessions.

Ian Davis from Talis talked about 'linked open culture' as a preview of the linked data track. How to take practices learned from linked data and apply them to open culture sector. We're always looking for ways to exchange info, communicate more effecively. We're no longer limited by the physicality of information. 'The semantic web fundamentally changes how information, machines and people are connected together'. The semantic web and its powerful network effects are enabling a radical transformation away from islands of data. One question is, does preservation require protection, isolation, or to copy it as widely as possible?

Conjecture 1 – data outlasts code. MARC stays forever, code changes. This implies that open data is more important than open source.
Conjecture 2 – structured data is more valuable than unstructured. Therefore we should seek to structure our data well.
Conjecture 3 – most of the value in our data will be unexpected and unintended. Therefore we should engineer for serendipity.

'Provide and enable' – UK National Archives phrase. Provide things you're good at – use unique expertise and knowledge [missed bits]… enable as many people as possible to use it – licence data for re-use, give important things identifiers, link widely.

'The goal is not to build a web of data. The goal is to enrich lives through access to information.'
[I think this is my new motto – it sums it up so perfectly. Yes, we carry on about the technology, but only so we can get it built – it's the means to an end, not the end itself. It's not about applying acronyms to content, it's about making content more meaningful, retaining its connection to its source and original context, making the terms of use clear and accessible, making it easy to re-use, encouraging people to make applications and websites with it, blah blah blah – but it's all so that more people can have more meaningful relationships with their contemporary and historical worlds.]

Kevin Sumption from the National Maritime Museum presented on the user-generated content track. A look ahead – the cultural sector and new models… User-generated content (UGC) is a broad description for content created by end users rather than traditional publishers. Museums have been active in photo-sharing, social tagging, wikipedia editing.

Crowdsourcing e.g. – reCAPTCHA [digitising books, one registration form at a time]. His team was inspired by the approach, created a project called 'Old Weather' – people review logs of WWI British ships to transcribe the content, especially meterological data. This fills in a gap in the meterological dataset for 1914 – 1918, allows weather in the period to be modelled, contributes to understanding of global weather patterns.

Also working with Oxford Uni, Rutherford Institute, Zooniverse – solar stormwatch – solar weather forecast. The museum is working with research institutions to provide data to solve real-world problems. [Museums can bring audiences to these projects, re-ignite interest in science, you can sit at home or on the train and make real contributions to on-going research – how cool is that?]

Community collecting. e.g. mass observation project 1937 – relaunched now and you can train to become an observer. You get a brief e.g. families on holidays.

BBC WW2 People's War – archive of WWII memories. [check it out]

RunCoCO – tools for people to set up community-lead, generated projects.

Community-lead research – a bit more contentious – e.g. Guardian and MPs expenses. Putting data in hands of public, trusting them to generate content. [Though if you're just getting people to help filter up interesting content for review by trusted sources, it's not that risky].

The final thematic track preview was by Charles Oppenheim from Loughborough University, on the risks and rewards of placing metadata and content on the web. Legal context – authorisation of copyright holder is required for [various acts including putting it on the web] unless… it's out of copyright, have explicit permission from rights holder (not implied licence just cos it's online), permission has been granted under licensing scheme, work has been created by a member of staff or under contract with IP assigned.

Issues with cultural objects – media rich content – multiple layers of rights, multiple rights holders, multiple permissions often required. Who owns what rights? Different media industries have different traditions about giving permission. Orphan works.

Possible non-legal ramifiations of IPR infringements – loss of trust with rights holders/creators; loss of trust with public; damage to reputation/bad press; breach of contract (funding bodies or licensors); additional fees/costs; takedown of content or entire service.

Help is at hand – Strategic Content Alliance toolkit [online].

Copyright less to do with law than with risk management – assess risks and work out how will minimise them.

Risks beyond IPR – defamation; liability for provision of inaccurate information; illegal materials e.g. pornography, pro-terrorism, violent materials, racist materials, Holocaust denial; data protection/privacy breaches; accidental disclosure of confidential information.

High risk – anything you make money from; copying anything that is in copyright and is commercially availabe.
Low risk – orphan works of low commercial value – letters, diaries, amateur photographs, films, recordings known by less known people.
Zero risk stuff.
Risks on the other side of the coin [aka excuses for not putting stuff up]

Ask a cultural heritage technologist?

I'm speaking at Open Tech 2010 (book your ticket now, only £5!) and it feels like the situation (and the mood) in the UK has changed since I first wrote my proposal and I'm not sure it suits anymore.  So I wanted to throw a few questions open to you to help me re-focus on the things that matter now:

  • what do you value about museums and technology, particularly the web, social media, open data? 
  • what do you want to know from someone working behind the scenes in museum technology?
  • what suggestions would you make if you were able to talk to museums?
  • what aren't museums asking our audiences (including our geek audiences) that we should be asking?
  • what's your favourite biscuit (or cookie)?
The title, by the way, is a play on 'ask a curator', an online event of some sort where you can ask whatever you've always wanted to ask a curator by using the hash tag #askacurator on twitter (or possibly also by commenting on a museum's blog, Facebook wall, twitter account, etc).

'Game mechanics for social good: a case study on interaction models for crowdsourcing museum collections enhancement'

I've been very quiet lately – exams for my MSc and work on the digital infrastructure for two new galleries (and a contemporary science news website) opening next week at the Science Museum have kept me busy – but I wanted to take a moment to post about my dissertation project. (Which reminds me, I should write up the architecture I designed to extend our core Sitecore CMS with WordPress to support social media-style interactions with Science Museum-authored content.)

Anyway. This project is for my dissertation for City University's Human-Centred Systems MSc. I'm happy to share the whole outline, but it's a bit academic in format for a blog post so I've just posted an excerpt here. I'd love to hear your comments, particularly if you know of or have been involved in creating, crowdsourced museum projects or games for social good.

'Game mechanics for social good: a case study on interaction models for crowdsourcing museum collections enhancement' is the current title – it's a bit of a mouthful but hopefully the project will do what it says on the tin.

Project description
The primary focus of this project is the design and evaluation of interactions applied to the context of an online museum collection in order to encourage members of the public to undertake specific tasks that will help improve the website.

The project will include a design and build component to create game-like interfaces for testing and evaluation, but the main research output is the analysis of museum crowdsourced projects and 'games for social good' to develop potential models for game-like interactions suitable for museum collections, and the subsequent evaluation of the proposed interaction models.

Aims and Objectives
This project aims to answer this question: can game-like interactions be designed to motivate people to undertake tasks on museum websites that will improve the overall quality of the website for other visitors?

More specifically, which elements of game mechanics are effective when applied to interfaces to crowdsource museum collections enhancement?

Objectives

  • Design game-like interaction models applicable to cultural heritage content and audiences through research, analysis and creativity workshops
  • Build an application and interfaces to create and store user-created content linked to collections content
  • Evaluate the effectiveness of game-like interaction models for eliciting useful content

Theory
Recent projects such as Armchair Revolutionary[1] and earlier projects such as Carnegie Mellon University's 'Games with a purpose'[2] and InterroBang?![3] are indicative of the trend for 'games for social good'. Crowdsourced projects such as the Guardian newspapers examination of MPs expense claims[4], the V&A Museum's image cropping[5], Brooklyn Museum's tagging game[6], the National Library of Australia's collaborative OCR corrections[7]; Chen's (2006) study of the application of Csikszentmihalyi's theory of 'flow' to game design; and Dr Jane McGonigal's ideas about multiplayer games as 'happiness engines'[8] all suggest that 'playful interactions' and crowd participation could be applied to help create specific content improvements on museum sites. Game mechanisms may help make tasks that would not traditionally considered fun or relaxing into a compelling experience.

Within the terms of this project, the output of a game-like interaction must produce an effect outside the interaction itself – that is, the result of a user's interactions with the site should produce beneficial effects for other site visitors who are not involved in the original interactions. To achieve this, it must generate content to enhance the site for subsequent visitors. Methods to achieve this could include creating trails of related objects, entering tags to describe objects, writing alternative labels or researching objects – these will be defined during the research phase and creativity workshops.

Methods and tools

The project is divided into several stages, each with their own methodology and considerations.

Research
The preliminary research process involves a literature review, research into game mechanics and the theory of flow, and research into museum audiences online. It will also include a series of short semi-structured interviews with people involved in creating crowdsourced projects on museum sites or game-like interactions to encourage the completion of set tasks (e.g. games for social good) in order to learn from their reflections on the design process; and analysis of existing sites in both these areas against the theories of game design. This research will define the metrics of the evaluation phase.

Creativity workshop(s)
The results of this research phase sets the parameters for creativity workshops designed to come up with ideas and possible designs for the game-like interfaces to be built. Possible objectives for the creativity workshop include:

  • designing methods for building different levels of challenge into the user experience in an environment that does not easily support different levels of challenge when museum-related skills remain at a constant level
  • creating experiences that are intrinsically rewarding to enable 'flow' within the constraints of available content

Build and test
In turn, the creativity workshops will help determine the interfaces to be built and tested in the later part of the project. The build will be iterative, and is planned to involve as many build-test-review-build iterations as will fit in the allocated time, in order to test as many variant interaction models as possible and support optimisation of existing designs after evaluation. User recruitment in this phase may be a sample of convenience from the target age group.

The interfaces will be developed in HTML, CSS and JavaScript, and published on a WordPress platform. This allows a neat separation of functionality and interface design. Session data (date, interface version, tester ID) can be recorded alongside user data. WordPress's template and plug-in based architecture also supports clear versioning between different iterations of the design, allowing reconstruction of earlier versions of the interfaces for later comparison, and enabling possible split A/B trials.

Analysis and write-up
Analysis will include the results of user testing and user data recorded in the WordPress platform to evaluate the performance of various interface and interaction designs. If the platform attracts usage outside the user testing sessions it may also include log file or Google Analytics analysis of use of the interfaces.

[1] https://www.armrev.org
[2] http://www.gwap.com/
[3] http://www.playinterrobang.com/
[4] http://mps-expenses.guardian.co.uk/
[5] http://collections.vam.ac.uk/crowdsourcing/
[6] http://www.brooklynmuseum.org/opencollection/tag_game/start.php
[7] http://newspapers.nla.gov.au/ndp/del/home
[8] http://www.futureofmuseums.org/events/lecture/mcgonigal.cfm