‘I see, I feel, hence I notice, I observe, and I think’

More and more open and/or linkable cultural heritage data is becoming available, which means the next big challenge for memory institutions is dealing with ‘death by aggregation: creating meaningful, engaging experiences of individual topics or objects within masses of digital data.  With that in mind, I’ve been wondering about the application of Roland Barthes‘ concepts of studium and punctum to large online collections.  (I’m in the middle of research interviews for my PhD, and it’s amazing what one will think about in order to put off transcribing hours of recordings, but bear with me…)

Studium, in Wikipedia’s definition, is the ‘cultural, linguistic, and political interpretation of a photograph’.  While Barthes was writing about photography, I suspect studium describes the average, expected audience response to well-described images or objects in most collections sites – a reaction that exists within the bounds of education, liking and politeness.  However, punctum – in Barthes’ words, the ‘element which rises from the scene, shoots out of it like an arrow, and pierces me’ – describes the moment an accidentally poignant or meaningful detail in an image captures the viewer.  Punctum is often personal to the viewer, but when it occurs it brings with it ‘a power of expansion’: ‘I see, I feel, hence I notice, I observe, and I think’.  You cannot design punctum, but can we design collections interfaces to create the serendipitous experiences that enable punctum?  Is it even possible with images of objects, or is it more likely to occur with photographic collections?

While thinking about this, I came across an excellent post on Understanding Compelling Collections by John Coburn (@j0hncoburn) in which he describes some pilots on ‘compelling historic photography’ by Tyne & Wear Archives & Museums. The experiment asked two questions: ‘Which of our collections best lends themselves to impulse sharing online?’ and ‘Which of our collections are people most willing to talk about online?’.  It’s well worth reading both for their methods and their results, which are firmly grounded in the audiences’ experience of their images: a ‘key finding from our trial with Flickr Commons was that the mass sharing of images often only became possible when a user defined or redefined the context of the photograph’, ‘there’s a very real appetite on Facebook for old photography that strongly connects to a person’s past’.

Coming back to Barthes, their quest for images that ‘immediately resonated with our audience on an emotional level and without context’ is almost an investigation of enabling punctum; their answer: anything that How To Be a Retronaut would share’, is probably good enough for most of us for now.  To summarise, they’re ‘era-specific, event-specific, moment-specific’ images that ‘disrupt people’s model of time’, that ‘tap into magic and the sublime’, and that ‘stir your imagination, not demand prior knowledge or interest’.  They’re small, tightly-curated, niche-interest sets of images with evocative titles.

That’s not how we generally think about or present online collections.  But what if we did?

[Update, May 16, 2012.

This post, from Flickr members co-curating an exhibition with the National Maritime Museum, offers another view – is the public searching for punctum when they view photographic collections, and does the museum/archive way of thinking about collections iron out the quirks that might lead to punctum?

‘It is frightening to imagine what treasures will never see the light of day from the collection at the Brass Foundry. I got the sense that the Curators and the National Maritime Museum in general see these images as closely guarded historical documents and as such offer insight location, historical events and people in the image. There seems to be a lack of artistic appreciation for the variety of unusual and standalone images in the collection, raising an important question concerning the value attributed to each photograph when interpreted by an audience with different aesthetic interests. … In my opinion it is the ‘unknown’ quality of photography that initially inspires engagement and subsequently this process encourages an exploration of our own identity and how we as individuals create meaning.’  Source: ‘The Brass Foundary Visit 19/04/2012’]

The rise of the non-museum (and death by aggregation)

A bit of an art museum/gallery-focussed post… And when I say ‘post’, I mean ‘vaguely related series of random thoughts’… but these ideas have been building up and I might as well get them out to help get them out of ‘draft’.

Following on from various recent discussions (especially the brilliantly thought-provoking MCG’s Spring meeting ‘Go Collaborate’) and the launches over the past few months of the Google Art Project, Artfinder and today’s ‘Your Paintings‘ from the BBC and the Public Catalogue Foundation, I’ve been wondering what space is left for galleries online.  (I’ve also been thinking about Aaron’s “you are about to be eaten by robots” and the image of Google and Facebook ‘nipping at your heels’ to become ‘the arbiter of truth for ideas’ and the general need for museums to make a case for their special place in society.)  Between funding cuts on the one hand, and projects from giants like Google and the BBC and even Europeana on the other, what can galleries do online that no-one else can?

So I asked on twitter, wondering if the space that was left was in creating/curating specialist interest and/or local experiences… @bridgetmck responded “Maybe the space for museums to work online now is meaning-making, intellectual context, using content to solve problems?”  The idea of that the USP of an museum is based on knowledge and community rather than collections is interesting and something I need to think about more.

The twitter conversation also branched off into a direction I’ve been thinking about over the past few months – while it’s great that we’re getting more and more open content [seriously, this is an amazing problem to have], what’s the effect of all this aggregation on the user experience?  @rachelcoldicutt had also been looking at ‘Your Paintings’ and her response was to my ‘space’ question was: “I think the space left is for curation. I feel totally overwhelmed by ALL THOSE paintings. It’s like a storage space not a museum”.  She’d also just tweeted “are such enormous sites needed when you can search and aggregate? Phaps yes for data structure/API, but surely not for *ppl*” which I’m quoting because I’ve been thinking the same thing.

[Update 2, July 14: Or, as Vannevar Bush said in ‘As We May Think‘ in 1945: “There is a new profession of trail blazers, those who find delight in the task of establishing useful trails through the enormous mass of the common record.”]

Have we reached a state of ‘death by aggregation’?  Even the guys at Artfinder haven’t found a way to make endless lists of search results or artists feel more like fun than work.

Big aggregated collections are great one-stop shops for particular types of researchers, and they’re brilliant for people building services based on content, but is there a Dunbar number for the number of objects you can view in one sitting?  To borrow the phrase Hugh Wallace used at MuseumNext, ‘snackable‘ or bite-sized content seems to fit better into the lives of museum audiences, but how do we make collections and the knowledge around them ‘snackable’?  Which of the many ways to curate that content into smaller sets – tours, slideshows, personal galleries, recommender systems, storytelling – works in different contexts?  And how much and what type of contextual content is best, and what is that Dunbar number?  @benosteen suggested small ‘community sets’ or “personal ‘threads'” – “interesting people picking 6->12 related items (in their opinion) and discussing them?”.  [And as @LSpurdle pointed out, what about serendipity, or the ‘surprising beauty’ Rachel mentioned?]

I’m still thinking it all through, and will probably come back and update as I work it out.  In the meantime, what do you think?

[Update: I’ve only just remembered that I’d written about an earlier attempt to get to grips with the effects of aggregation and mental models of collections that might help museums serve both casual and specialist audiences in Rockets, Lockets and Sprockets – towards audience models about collections? – it still needs a lot of thought and testing with actual users, I’d love to hear your thoughts or get pointers to similar work.]

Notes on ‘User Generated Content’ session, Open Culture Conference 2010

My notes from the ‘user generated content’ parallel track on first day of the Open Culture 2010 conference. The session started with brief presentations by panellists, then group discussions at various tables on questions suggested by the organisers. These notes are quite rough, and of course any mistakes are mine. I haven’t had a chance to look for the speakers’ slides yet so inevitably some bits are missing, and I can only report the discussion at the table I was at in the break-out session. I’ve also blogged my notes from the plenary session of the Open Culture 2010 conference.

User-generated content session, Open Culture, Europeana – the benefits and challenges of UGC.
Kevin Sumption, User-generated content, a MUST DO for cultural institutions
His background – originally a curator of computer sciences. One of first projects he worked on at Powerhouse was D*Hub which presented design collections from V&A, Brooklyn Museum and Powerhouse Museum – it was for curators but also for general public with an interest in design. Been the source of innovation. Editorial crowd-sourcing approach and social tagging, about 8 years ago.

Two years ago he moved to National Maritime Museum, Royal Observatory, Greenwich. One of the first things they did was get involved with Flickr Commons – get historic photographs into public domain, get people involved in tagging. c1000 records in there. General public have been able to identify some images as Adam Villiers images – specialists help provide attribution for the photographer. Only for tens of records of the 000s but was a good introduction to power of UGC.

Building hybrid exhibition experiences – astronomy photographer of the year – competition on Flickr with real world exhibition for the winners of the competition. ‘Blog’ with 2000 amateur astronomers, 50 posts a day. Through power of Flickr has become a significant competition and brand in two years.

Joined citizen science consortia. Galaxy Zoo. Brainchild of Oxford – getting public engaged with real science online. Solar Stormwatch c 3000 people analysing and using the data. Many people who get involved gave up science in high school… but people are getting re-engaged with science *and* making meaningful contributions.

Old Weather – helping solve real-world problems with crowdsourcing. Launched two months ago.
Passion for UGC is based around where projects can join very carefully considered consortia, bringing historical datasets with real scientific problems. Can bring large interested public to the project. Many of the public are reconnecting with historical subject matter or sciences.

Judith Bensa-Moortgat, Nationaal Archief, Netherlands, Images for the Future project
Photo collection of more than 1 million photos. Images for the future project aims to save audio-visual heritage through digitisation and conservation of 1.2 million photos.

Once digitised, they optimise by adding metadata and context. Have own documentalists who can add metadata, but it would take years to go through it all. So decided to try using online community to help enrich photo collections. Using existing platforms like Wikipedia, Flickr, Open Street map, they aim to retrieve contextual info generated by the communities.  They donated political portraits to Wikimedia Commons and within three weeks more than half had been linked to relevant articles.

Their experiences with Flickr Commons – they joined in 2008. Main goal was to see if community would enrich their photos with comments and tags. In two weeks, they had 400,000 page views for 400 photos, including peaks when on Dutch TV news. In six months, they had 800 photos with over 1 million views. In Oct 2010, they are averaging 100,000 page views a month; 3 million overall.

But what about comments etc? Divided them into categories of comments [with percentage of overall contributions]:

  • factual info about location, period, people 5%; 
  • link to other sources eg Wikipedia 5%; 
  • personal stories/memories (e.g. someone in image was recognised); 
  • moral discussions; 
  • aesthetical discussions; 
  • translations.

The first two are most important for them.
13,000 tags in many languages (unique tags or total?).
10% of the contributed UGC was useful for contextualisation; tags ensure accessibility [discoverability?] on the web; increased (international) visibility. [Obviously the figures will vary for different projects, depending on what the original intent of the project was]

The issues she’d like to discuss are – copyright, moderation, platforms, community.

Mette Bom, 1001 Stories about Denmark
Story of the day is one of the 1001 stories. It’s a website about the history and culture of Denmark. The stories have themes, are connected to a timeline.  Started with 50 themes, 180 expert writers writing the 1001 stories, now it’s up to the public to comment and write their own stories. Broad definition of what heritage is – from oldest settlement to the ‘porn street’ – they wanted to expand the definition of heritage.

Target audiences – tourists going to those places; local dedicated experts who have knowledge to contribute. Wanted to take Danish heritage out of museums.

They’ve created the main website, mobile apps, widget for other sites, web service.  Launched in May 2010.  20,000 monthly users. 147 new places added, 1500 pictures added.

Main challenges – how to keep users coming back? 85% new, 15% repeat visitors (ok as aimed at tourists but would like more comments). How to keep press interested and get media coverage? Had a good buzz at the start cos of the celebrities. How to define participation? Is it enough to just be a visitor?

Johan Oomen, Netherlands Institute for Sound and Vision, Vrij Uni Amsterdam. Participatory Heritage: the case of the Waisda? video labelling game.
They’re using game mechanisms to get people to help them catalogue content. [sounds familiar!]
‘In the end, the crowd still rules’.
. Tagging is a good way to facilitate time-based annotation [i.e. tag what’s on the screen at different times]

Goal of game is consensus between players. Best example in heritage is steve.museum; much of the thinking about using tagging as a game came from Games with a Purpose (gwap.com).  Basic rule – players score points when their tag exactly matches the tag entered by another within 10 seconds. Other scoring mechanisms.  Lots of channels with images continuously playing.

Linking it to twitter – shout out to friends to come join them playing.  Generating traffic – one of the main challenges. Altruistic message ‘help the archive’ ‘improve access to collections’ came out of research with users on messages that worked. Worked with existing communities.

Results, first six months – 44,362 pageviews. 340,000 tags to 604 items, 42,068 unique tags.
Matches – 42% of tags entered more than 2 times. Also looked at vocab (GTAA, Cornetto), 1/3 words were valid Dutch words, but only a few part of thesauruses.  Tags evaluated by documentalists. Documentary film 85% – tags were useful; for reality series (with less semantic density) tags less useful.

Now looking at how to present tags on the catalogue Powerhouse Museum style.  Experimenting with visualising terms, tag clouds when terms represented, also makes it easy to navigate within the video – would have been difficult to do with professional metadata.  Looking at ‘tag gardening’ – invite people to go back to their tags and click to confirm – e.g. show images with particular tags, get more points for doing it.

Future work – tag matching – synonyms and more specific terms – will get more points for more specific terms.

Panel overview by Costis Dallas, research fellow at Athena, assistant professor at Panteion University, Athens.
He wants to add a different dimension – user-generated content as it becomes an object for memory organisations. New body of resources emerging through these communication practices.
Also, we don’t have a historiography anymore; memory resides in personal information devices.  Mashups, changes in information forms, complex composed information on social networks – these raise new problems for collecting – structural, legal, preservation in context, layered composition.  What do we need to do now in order to be able to make use of digital technologies in appropriate, meaningful ways in the future? New kinds of content, participatory curation are challenges for preservation.

Group discussion (breakout tables)
Discussion about how to attract users. [It wasn’t defined whether it was how to attract specifically users who’ll contribute content or just generally grow the audience and therefore grow the number of content creators within the usual proportions of levels of participation e.g. Nielsen, Forrester; I would also have liked to discussed how to encourage particular kinds of contributions, or to build architectures of participation that provided positive feedback to encourage deeper levels of participation.]

Discussion and conclusions included – go with the strengths of your collections e.g. if one particular audience or content-attracting theme emerges, go with it.  Norway has a national portal where people can add content. They held lots of workshops for possible content creators; made contact with specialist organisations [from which you can take the lesson that UGC doesn’t happen in a vacuum, and that it helps to invest time and resources into enabling participants and soliciting content].  Recording living history.  Physical presence in gallery, at events, is important.  Go where audiences already are; use existing platforms.

Discussion about moderation included – once you have comments, how are they integrated back into collections and digital asset management systems?  What do you do about incorrect UGC displayed on a page?  Not an issue if you separate UGC from museum/authoritative content in the interface design.  In the discussion it turned out that Europeana doesn’t have a definition of ‘moderation’.  IMO, it should include community management, including acknowledging and thanking people for contributions (or rather, moderation is a subset of community management).  It also includes approving or reviewing and publishing content, dealing with corrections suggested by contributors, dealing with incorrect or offensive UGC, adding improved metadata back to collections repositories.

User-generated content and trust – British Library apparently has ‘trusted communities’ on their audio content – academic communities (by domain name?) and ‘everyone else’.  Let other people report content to help weed out bad content.

Then we got onto a really interesting discussion of which country or culture’s version of ‘offensive’ would be used in moderating content.  Having worked in the UK and the Netherlands, I know that what’s considered a really rude swear word and what’s common vocabulary is quite different in each country… but would there be any content left if you considered the lowest common standards for each country?  [Though thinking about it later, people manage to watch films and TV and popular music from other countries so I guess they can deal with different standards when it’s in context.]  To take an extreme content example, a Nazi uniform as memorabilia is illegal in Germany (IIRC) but in the UK it’s a fancy dress outfit for a member of the royal family.

Panel reporting back from various table discussions
Kevin’s report – discussion varied but similar themes across the two tables. One – focus on the call to action, why should people participate, what’s the motivation? How to encourage people to participate? Competitions suggested as one solution, media interest (especially sustained). Notion of core group who’ll energise others. Small groups of highly motivated individuals and groups who can act as catalysts [how to recruit, reward, retain]. Use social media to help launch project.

1001 Danish Stories promotional video effectively showed how easy the process of contributing content was,  and that it doesn’t have to to be perfect (the video includes celebrities working the camera [and also being a bit daggy, which I later realised was quite powerful – they weren’t cool and aloof]).
Giving users something back – it’s not a one-way process. Recognition is important. Immediacy too – if participating in a project, people want to see their contributions acknowledged quickly. Long approval processes lose people.
Removal of content – when different social, political backgrounds with different notions of censorship.

Mette’s report – how to get users to contribute – answers mostly to take away the boundaries, give the users more credit than we otherwise tend to. We always think users will mess things up and experts will be embarrassed by user content but not the case. In 1001 they had experts correcting other experts. Trust users more, involve experts, ask users what they want. Show you appreciate users, have a dialouge, create community. Make it a part of life and environment of users. Find out who your users are.

Second group – how Europeana can use the content provided in all its forms. Could build web services to present content from different places, linking between different applications.
How to set up goals for user activity – didn’t get a lot of answers but one possibility is to start and see how users contribute as you go along. [I also think you shouldn’t be experimenting with UGC without some goal in mind – how else will you know if your experiment succeeded?  It also focusses your interaction and interface design and gives the user some parameters (much more useful than an intimidating blank page)].

Judith’s report (including our table) – motivation and moderation in relation to Europeana – challenging as Europeana are not the owners of the material; also dealing with multilingual collections. Culturally-specific offensive comments. Definition and expectations of Europeana moderation. Resources need if Europeana does the moderation.
Incentives for moderation – improving data, idealism, helping with translations – people like to help translate.

Johan’s report – rewards are important – place users in social charts or give them a feeling of contributing to larger thing; tap into existing community; translate physical world into digital analogue.
Institutional policy – need a clear strategy for e.g. how to integrate the knowledge into the catalogue. Provide training for staff on working with users and online tools. There’s value in employing community managers to give people feedback when they leave content.
Using Amazon’s Mechanical Turk for annotations…
Doing the projects isn’t only of benefit in enriching metadata but also for giving insight into users – discover audiences with particular interests.

Costis commenting – if Europeana only has thumbnails and metadata, is it a missed opportunity to get UGC on more detailed content?

Is Europeana highbrow compared to other platforms like Flickr, FB, so would people be afraid to contribute? [probably – there must be design patterns for encouraging participation from audiences on museum sites, but we’re still figuring out what they are]
Business model for crowdsourcing – producing multilingual resources is perfect case for Europeana.

Open to the floor for questions… Importance of local communities, getting out there, using libraries to train people. Local newspapers, connecting to existing communities.

Notes from Europeana’s Open Culture Conference 2010

The Open Culture 2010 conference was held in Amsterdam on October 14 – 15. These are my notes from the first day (I couldn’t stay for the second day). As always, they’re a bit rough, and any mistakes are mine. I haven’t had a chance to look for the speakers’ slides yet so inevitably some bits are missing.  If you’re in a hurry, the quote of the day was from Ian Davis: “the goal is not to build a web of data. The goal is to enrich lives through access to information”.

The morning was MCd by Costis Dallas and there was a welcome and introduction from the chair of the Europeana Foundation before Jill Cousins (Europeana Foundation) provided an overview of Europeana. I’m sure the figures will be available online, but in summary, they’ve made good progress in getting from a prototype in 2008 to an operational service in 2010. [Though I have written down that they had 1 million visits in 2010, which is a lot less than a lot of the national museums in the UK though obviously they’ve had longer to establish a brand and a large percentage of their stats are probably in the ‘visit us’ areas rather than collections areas.]

Europeana is a super-aggregator, but doesn’t show the role of the national or thematic aggregators or portals as providers/collections of content. They’re looking to get away from a one-way model to the point where they can get data back out into different places (via APIs etc). They want to move away from being a single destination site to putting information where the user is, to continue their work on advocacy, open source code etc.

Jill discussed various trends, including the idea of an increased understanding that access to culture is the foundation for a creative economy. She mentioned a Kenneth Gilbraith [?] quote on spending more on culture in recession as that’s where creative solutions come from [does anyone know the reference?]. Also, in a time of Increasing nationationalism, Europeana provided an example to combat it with example of trans-Euro cooperation and culture. Finally, customer needs are changing as visitors move from passive recipients to active participants in online culture.

Europeana [or the talk?] will follow four paths – aggregration, distribution, facilitation, engagement.

  • Aggregation – build the trusted source for European digital cultural material. Source curated content, linked data, data enrichment, multilinguality, persistent identifiers. 13 million objects but 18-20thC dominance; only 2% of material is audio-visual [?]. Looking towards publishing metadata as linked open data, to make Europeana and cultural heritage work on the web, e.g. of tagging content with controlled vocabularies – Vikings as tagged by Irish and Norwegian people – from ‘pillagers’ to ‘loving fathers’. They can map between these vocabularies with linked data.
  • Distribution – make the material available to the user wherever they are, whenever they want it. Portals, APIs, widgets, partnerships, getting information into existing school systems.
  • Facilitate innovation in cultural heritage. Knowledge sharing (linked data), IPR business models, policy – advocacy and public domain, data provider agreements. If you write code based on their open sourced applications, they’d love you to commit any code back into Europeana. Also, look at Europeana labs.
  • Engagement – create dialogue and participation. [These slides went quickly, I couldn’t keep up]. Examples of the Great War Archive into Europe [?]. Showing the European connection – Art Nouveau works across Europe.

The next talk was Liam Wyatt on ‘Peace love and metadata’, based in part on his experience at the British Museum, where he volunteered for a month to coordinate the relationship between Wikipedia as representative of the open web [might have mistyped that, it seems quite a mantle to claim] and the BM as representatiave of [missed it]. The goal was to build a proactive relationship of mutual benefit without requiring change in policies or practices of either. [A nice bit of realism because IMO both sides of the museum/Wikipedia relationship are resistant to change and attached firmly to parts of their current models that are in conflict with the other conglomeration.]

The project resulted in 100 new Wikipedia articles, mostly based on the BM/BBC A History of the World in 100 Objects project (AHOW). [Would love to know how many articles were improved as a result too]. They also ran a ‘backstage pass’ day where Wikipedians come on site, meet with curators, backstage tour, then they sit down and create/update entries. There were also one-on-one collaborators – hooking up Wikipedians and curators/museums with e.g. photos of objects requested.

It’s all about improving content, focussing on personal relationshiips, leveraging the communities; it didn’t focus on residents (his own work), none of them are content donation projects, every institution has different needs but can do some version of this.

[I’m curious about why it’s about bringing Wikipedians into museums and not turning museum people into Wikipedians but I guess that’s a whole different project and may be result from the personal relationships anyway.]

Unknown risks are accounted for and overestimated. Unknown rewards are not accounted for and underestimated. [Quoted for truth, and I think this struck a chord with the audience.]

Reasons he’s heard for restricting digital access… Most common ‘preserving the integrity of the collection’ but sounds like need to approve content so can approve of usages. As a result he’s seen convoluted copyright claims – it’s easy tool to use to retain control.

Derivative works. Commercial use. Different types of free – freedom to use, freedom to study and apply knowledge gained; freedom to make and redistribute copies; [something else].

There are only three applicable licences for Wikipedia. Wikipedia is a non-commercial organisation, but don’t accept any non-commercially licenced content as ‘it would restrict the freedom of people downstream to re-use the content in innovative ways’. [but this rules out much museum content, whether rightly or not, and with varying sources from legal requirements to preference. Licence wars (see the open source movement) are boring, but the public would have access to more museum content on Wikipedia if that restriction was negotiable. Whether that would outweight the possible ‘downstream’ benefit is an interesting question.]

Liam asked the audience, do you have a volunteer project in your institution? do you have an e-volunteer program? Well, you do already, you just don’t know it. It’s a matter of whether you want to engage with them back. You don’t have to, and it might be messy.

Wikipedia is not a social network. It is a social construction – it requires a community to exist but socialising is not the goal. Wikipedia is not user generated content. Wikipedia is community curated works. Curated, not only generated. Things can be edited or deleted as well as added [which is always a difficulty for museums thinking about relying on Wikipedia content in the long term, especially as the ‘significance’ of various objects can be a contested issue.]

Happy datasets are all alike; every unhappy dataset is unhappy in its own way. A good test of data is that it works well with others – technically or legally.

According to Liam, Europeana is the 21st century of the gallery painting – it’s a thumbnail gallery but it could be so much more if the content was technically and legally able to be re-used, integrated.
Data already has enough restrictions already e.g. copyright, donor restrictions. but if it comes without restrictions, its a shame to add them. ‘Leave the gate as you found it’.

‘We’re doing the same thing for the same reason for the same people in the same medium, let’s do it together.’

The next sessions were ‘tasters’ of the three thematic tracks of the second part of the day – linked data, user-generated content, and risks and rewards. This was a great idea because I felt like I wasn’t totally missing out on the other sessions.

Ian Davis from Talis talked about ‘linked open culture’ as a preview of the linked data track. How to take practices learned from linked data and apply them to open culture sector. We’re always looking for ways to exchange info, communicate more effecively. We’re no longer limited by the physicality of information. ‘The semantic web fundamentally changes how information, machines and people are connected together’. The semantic web and its powerful network effects are enabling a radical transformation away from islands of data. One question is, does preservation require protection, isolation, or to copy it as widely as possible?

Conjecture 1 – data outlasts code. MARC stays forever, code changes. This implies that open data is more important than open source.
Conjecture 2 – structured data is more valuable than unstructured. Therefore we should seek to structure our data well.
Conjecture 3 – most of the value in our data will be unexpected and unintended. Therefore we should engineer for serendipity.

‘Provide and enable’ – UK National Archives phrase. Provide things you’re good at – use unique expertise and knowledge [missed bits]… enable as many people as possible to use it – licence data for re-use, give important things identifiers, link widely.

‘The goal is not to build a web of data. The goal is to enrich lives through access to information.’
[I think this is my new motto – it sums it up so perfectly. Yes, we carry on about the technology, but only so we can get it built – it’s the means to an end, not the end itself. It’s not about applying acronyms to content, it’s about making content more meaningful, retaining its connection to its source and original context, making the terms of use clear and accessible, making it easy to re-use, encouraging people to make applications and websites with it, blah blah blah – but it’s all so that more people can have more meaningful relationships with their contemporary and historical worlds.]

Kevin Sumption from the National Maritime Museum presented on the user-generated content track. A look ahead – the cultural sector and new models… User-generated content (UGC) is a broad description for content created by end users rather than traditional publishers. Museums have been active in photo-sharing, social tagging, wikipedia editing.

Crowdsourcing e.g. – reCAPTCHA [digitising books, one registration form at a time]. His team was inspired by the approach, created a project called ‘Old Weather’ – people review logs of WWI British ships to transcribe the content, especially meterological data. This fills in a gap in the meterological dataset for 1914 – 1918, allows weather in the period to be modelled, contributes to understanding of global weather patterns.

Also working with Oxford Uni, Rutherford Institute, Zooniverse – solar stormwatch – solar weather forecast. The museum is working with research institutions to provide data to solve real-world problems. [Museums can bring audiences to these projects, re-ignite interest in science, you can sit at home or on the train and make real contributions to on-going research – how cool is that?]

Community collecting. e.g. mass observation project 1937 – relaunched now and you can train to become an observer. You get a brief e.g. families on holidays.

BBC WW2 People’s War – archive of WWII memories. [check it out]

RunCoCO – tools for people to set up community-lead, generated projects.

Community-lead research – a bit more contentious – e.g. Guardian and MPs expenses. Putting data in hands of public, trusting them to generate content. [Though if you’re just getting people to help filter up interesting content for review by trusted sources, it’s not that risky].

The final thematic track preview was by Charles Oppenheim from Loughborough University, on the risks and rewards of placing metadata and content on the web. Legal context – authorisation of copyright holder is required for [various acts including putting it on the web] unless… it’s out of copyright, have explicit permission from rights holder (not implied licence just cos it’s online), permission has been granted under licensing scheme, work has been created by a member of staff or under contract with IP assigned.

Issues with cultural objects – media rich content – multiple layers of rights, multiple rights holders, multiple permissions often required. Who owns what rights? Different media industries have different traditions about giving permission. Orphan works.

Possible non-legal ramifiations of IPR infringements – loss of trust with rights holders/creators; loss of trust with public; damage to reputation/bad press; breach of contract (funding bodies or licensors); additional fees/costs; takedown of content or entire service.

Help is at hand – Strategic Content Alliance toolkit [online].

Copyright less to do with law than with risk management – assess risks and work out how will minimise them.

Risks beyond IPR – defamation; liability for provision of inaccurate information; illegal materials e.g. pornography, pro-terrorism, violent materials, racist materials, Holocaust denial; data protection/privacy breaches; accidental disclosure of confidential information.

High risk – anything you make money from; copying anything that is in copyright and is commercially availabe.
Low risk – orphan works of low commercial value – letters, diaries, amateur photographs, films, recordings known by less known people.
Zero risk stuff.
Risks on the other side of the coin [aka excuses for not putting stuff up]

Are small museums the long tail?

On the way home from the Semantic Web Think Tank last week (see previous post), I suddenly thought: are small or specialised museums the long tail?

Each museum by itself would represent a tiny proportion of the overall use of museum collections online, but if you put all that usage together, would their collections in fact have a higher rate of use than those of more ‘popular’ museums?

At the moment I don’t think there’s any way to find out, because so many small or specialised museums don’t have collections online, through a lack of expertise, digitisation resources or an easy-to-use publication infrastructure. Still, it’s an interesting question.