'An (even briefer) history of open cultural data' at GLAM-Wiki 2013

These are some of my notes for my invited plenary talk at GLAM-Wiki 2013 (Galleries, Libraries, Archives, Museums & Wikimedia, #GLAMWiki), held at the British Library on April 12-13, 2013. I don't think I stuck that closely to them on the day, and in the interests of brevity I've left out the 'timeline' bits (but you can read about some of them in a related MuseumID article, 'Where next for open cultural data in museums?') to focus on the lessons to be learnt from changes so far. There were lots of great talks and discussion at the event, you can view some of the presentations on Wikimedia UK's YouTube channel.

A (now very) brief history of open cultural data

Firstly, thank you for the invitation to speak… This morning I want to highlight some key moments of change in the history of open cultural data – a history not only of licenses and data, but also of conversations, standards, and collaborations, of moments where things changed… I've included key moments from funders, legislative influences and the commercial sector too, as they create the context in which change happens and often have an effect on what's considered possible. I'll close by considering some of the lessons learnt.

[Please help improve this talk]

A caveat – there may well be a bias towards the English-speaking world (and to museums, because of my background). If you know of an open GLAM (gallery, library, archive, museum) data source I've missed, you can add it to the open cultural data/GLAM API wiki… or Lotte's Belice's list of open culture milestones  timeline.

Definitions

'open cultural data' is data from cultural institutions that is made available for use in a machine-readable format under an open licence. But each word in open, cultural, data is slightly more complicated so I'll unpack them a little…

Open

Office clerks, FNV. Voorlichting.

While the degree of openness required to be 'open' data can be contentious, at its simplest, 'open' refers to content that is available for use outside the institution that created it, whether for school homework projects, academic monographs or mobile phone apps. 'Open' may refer to licences that clarify the permissions and restrictions placed on data, or to the use of non-proprietary digital technologies, or ideally, to a combination of both open licences and technologies.

Ideally, open data is freely available for use and redistribution by anyone for any purpose, but in reality there are often restrictions. GLAMs may limit commercial use by licensing content for 'non-commercial use only', but as there is no clear definition of 'non-commercial use' in Creative Commons licences, some developers may choose not to risk using a dataset with an unclear licence. GLAMs may also release data for commercial use but still require attribution, either to help retain the provenance of the content, to help people find their way to related content or just because they'd like some credit for their work. GLAMs might also release data under custom licences that deal with their specific circumstances, but they are then difficult to integrate with content from other openly-licensed datasets.

Hybrid licensing models are a pragmatic solution for the current environment. They at least allow some use and may contribute to greater use of open cultural data while other issues are being worked out. For example, some institutions in the UK are making lower resolutions images available for re-use under an open licence while reserving high resolution versions for commercial sales and licensing. Or they may differentiate between scholarly and commercial use, or use more restrictive licences for commercially valuable images and release everything else openly.

I think this type of access is better than nothing, particularly if organisations can learn from the experience and release more data next time. Because these hybrid models are often experimental, their reception is important, and it's helpful for GLAMs to be able to show they've had a positive impact and hopefully helped create relationships with groups like Wikipedia.

Cultural

Cultural data is data about objects, publications (such as books, pamphlets, posters or musical scores), archival material, etc, created and distributed by museums, libraries, archives and other organisations.

Data

It's a useful distinction to discuss early with other cultural heritage staff as it's easy to be talking at cross-purposes: data can refer to different types of content, from metadata or tombstone records (the basic titles, names, dates, places, materials, etc of a catalogue record), to entire collection records (including data such as researched and interpretive descriptions of objects, bibliographic data, related themes and narratives) to full digital surrogates of an object, document or book as images or transcribed text. Some organisations release open metadata, others release all their data including their images. If you can't do open data (full content or 'digital surrogates' like photographs or texts) then at least open up the metadata (data about the content) as e.g. CC0 and the rest with another licence. Releasing data may involve licensing images, offering downloads from catalogue sites; 'content donations', APIs and machine-facing interfaces; term lists, etc. Much of the data that isn't images isn't immediately interesting, and may be designed for inter-collections interoperability or mashups rather than media commons.

Why is open cultural data important?

Before I go on, why do we care? Open cultural data is the foundation on which many projects can be built. It helps achieve organisational goals, mission; can help increase engagement with content; can create 'network effect' with related institutions; can be re-used by people who share your goals around access to knowledge and information – people like Wikipedians.

Some key moments in open cultural data

Events I discussed included the founding of Wikimedia, Europeana and Flickr Commons, previous GLAM-Wiki conferences, changes in licences for art images, library catalogue records and museum content, GLAM APIs and linked data services and the launch of the Digital Public Library of America next week.

Lessons learnt

Many of the changes are the results of years of conversation and collaboration – change is slow but it does happen. GLAMs work through slow iterations – try something, and if no-one dies, they'll try something else. We are all ambassadors, and we are all translators, helping each domain understand the other.

Contradictory things GLAMs are told they must do

  • Give content away for the benefit of all
  • Monetise assets; protect against loss of potential income; protect against mis-use of collections; conserve collections in perpetuity; protect the IP of artists; demonstrate ROI on digitisation

It's not easy for GLAMs to release all their data under an entirely open licence, but they don't do it just to be annoying – it's important to understand some of the pressures they're under.  For example, GLAMs usually need to be able to track uses of their data and content to show the impact of digitising and publishing content, so they prefer attribution licences.

The issue of potential lost income – imaginary money that could be made one day if circumstances change, or profit that someone else makes off their opened data – is particularly difficult as hard to deal with [and here I ad-libbed, saying that it was like worrying about failing to meet the love of your life because you got on a different tube carriage – you can't live your life chasing ghosts]. Ideally, open data needs to be understood as an input to the creative economy rather than an item on the balance sheet of an individual GLAM.

GLAMs worry about reputational damage, whether appearing on the front page of a tabloid newspaper for the 'wrong' reasons, questions being asked in Parliament, or critique from Wikipedians.  Over time, their mindset is changing from keeping 'our data' to being holders, custodians of our shared heritage.

Conversations, communities, collaborations

Conversations matter… we're all working towards the same goal, but we have different types of anxieties and different problems we have to address.

GLAMs are about collections, knowledge, and audiences. Unlike most online work, they are used to seeing the excitement people experience walking through their door – help GLAMs understand what Wikipedians can do for different audiences by making those audience real to them. GLAMs are also used to being wined and dined before you lay the hard word on them. Just because you don't need to ask for permission to use content doesn't mean you shouldn't start a conversation with an organisation. There are lots of people with similar goals inside organisations, so try to find them and work with them. Trust is a currency, don't blow it!

Being truly collaborative sometimes means compromising (or picking your battles) and it definitely means practising empathy. Open data people could stop talking about open data as something you *do* to GLAMs, and GLAMs could stop thinking open data people just want to make your life difficult.

The role of higher powers

Government attitudes to open data make a big difference and they can also change the risks associated with publishing orphan works.  Governments can also help GLAMs open up their content by indemnifying them against the chance that someone else will monetise their data – consider it not a failure of the GLAM but a contribution to the creative and digital economy.

Things that are better than a poke in the eye with a sharp stick

  1. Kittens (and puppies)
  2. Cultural data that's available online but isn't (yet) openly licensed
  3. Cultural data online that is licensed for non-commercial use

Yes, the last two aren't ideal, but they are great deal better than nothing.

Into the future…

GLAMs and Wikipedians may move at different paces, and may have different priorities and different ways of viewing the world, but we're all working towards the same goals. Not everything is as open, but a lot more is open than it used to be. I sensed yesterday [the first day of the conference] that there are still some tensions between Wikimedians and GLAMers, moments when we need to take a deep breath and put empathy before a pithy put down, but I loved that Kat Walsh's welcome yesterday described how Wikipedia used to focus on how different from others but now focuses on reaching out to others and figuring out how we're the same.

GLAMs and Wikipedians have already used open cultural data to make the world a better place. Let's celebrate the progress we've made and keep working on that…

GLAM-WIKI 2013 Friday attendees photograph by Mike Peel (www.mikepeel.net).

Congratulations to everyone who helped make it a great event, but particularly to Daria Cybulska and Andrew Gray (@generalising) for making everything work so smoothly, and Liam Wyatt (@wittylama) for the original invitation to speak.

Join in the conversation about Wikimedia @ MW2010

Wikimedia@MW2010 is a workshop to be held in Denver in April, just before the Museums and the Web 2010 conference.  The goal is to develop 'policies that will enable museums to better contribute to and use Wikipedia or Wikimedia Commons, and for the Wikimedia community to benefit from the expertise in museums'.

If you've got stuff you want to say, you can dive right into the conversation – there's a whole bunch of conversations at http://conference.archimuse.com/forums/wikimediamw2010, including 'Legal and Business Model Barriers to Collaboration, 'Notability Criteria' and 'Metrics for Museums on Wikipedia'.

I'm going to be at the workshop and will do my best to represent any issues raised at the meeting.  I think it's particularly important that we avoid 'Feeling glum after GLAM-WIKI' if we possibly can, so I'd like to go there with a really good understanding of the possible points of resistance, clashes in organisational culture or world view, incompatible requirements or wishlists so that they can be raised and hopefully dealt with during the in-person workshop.  I'd love to hear from you if there are messages you want to pass on.

I'm also thinking about an informal meetup in London to help cultural heritage people articulate some of the issues that might help or hinder collaboration so they can be represented at the workshop – if you're a museum, gallery, archive, library or general cultural heritage bod, would that be useful for you?

Why do museums prefer Flickr Commons to Wikimedia Commons?

A conversation has sprung up on twitter about why museums prefer Flickr Commons to Wikimedia Commons after Liam Wyatt, Vice President of Wikimedia Australia posted "Flickr Commons is FULL for 2010. GLAMs, Fancy sharing with #Wikimedia commons instead?" and I responded with "has anyone done audience research into why museums prefer Flickr to Wikimedia commons?".  I've asked before because I think it's one of those issues where the points of resistance can be immensely informative.

I was struck by the speed and thoughtfulness of responses from kajsahartig, pekingspring, NickPoole1, richardmccoy and janetedavis, which suggested that the question hit a nerve.

Some of the responses included:

Kasja: Photos from collections have ended up at wikipedia without permission, that never happened with Flickr, could be one reason [and] Or museums are more benevolent when it happens at Flickr, it's seen more as individuals' actions rather than an organisations'?

Nick: Flickr lets you choose CC non-commercial licenses, whereas Wikimedia Commons needs to permit potential commercial use?

Janet: Apart fr better & clear CC licence info, like Flickr Galleries that can be made by all! [and] What I implied but didn't say before: Flickr provides online space for dialogue about and with images.

Richard: Flickr is so much easier to view and search than WM. Commons, and of course easier to upload.

Twitter can be a bit of an echo chamber at times, so I wanted to ask you all the question in a more accessible place.   So, is it true that museums prefer Flickr Commons to Wikimedia Commons, and if so, why?

[Update: Liam's new blog post addresses some of the concerns raised – this responsiveness to the issues is cheering.  (You can get more background at Wikipedia:Advice for the cultural sector and Wikipedia:Conflict of interest.)

Also, for those interested in wikimedia/wikipedia* and museums, there's going to be a workshop 'for exploring and developing policies that will enable museums to better contribute to and use Wikipedia or Wikimedia Commons, and for the Wikimedia community to benefit from the expertise in museums', Wikimedia@MW2010, at Museums at the Web 2010. There's already a thread, 'Wikimedia Foundation projects and the museum community' with some comments.  I'd love to see the 'Incompatible recommendations' section of the GLAM-Wiki page discussed and expanded.

* I'm always tempted to write 'wiki*edia' where * could be 'm' or 'p', but then it sounds like South Park's plane-rium in my head.]

[I should really stop updating, but I found Seb Chan's post on the Powerhouse Museum blog, Why Flickr Commons? (and why Wikimedia Commons is very different) useful, and carlstr summed up a lot of the issues neatly: "One of the reasons is that Flickr is a package (view, comment search aso). WC is a archive of photos for others to use. … I think Wikipedia/Wikimedia have potential for the museum sector, but is much more complex which can be deterrent.".]

The NPG's response to the Wikimedia kerfuffle

[Apparently responses are being listed on a Wikimedia page, which I suppose makes sense but please bear in mind this is usually read by about five people who know my flippant self in real life]

I haven't been able to get the press release section of the National Portrait gallery to load, so I'm linking to an email from the NPG posted as a comment on another blog.  I'm still thinking this through, but currently the important bit, to me, is this:

The Gallery is very concerned that potential loss of licensing income from the high-resolution files threatens its ability to reinvest in its digitisation programme and so make further images available. It is one of the Gallery’s primary purposes to make as much of the Collection available as possible for the public to view.

Digitisation involves huge costs including research, cataloguing, conservation and highly-skilled photography. Images then need to be made available on the Gallery website as part of a structured and authoritative database.

Obviously, I am paid by a museum to put things online so I might be biased towards something that ultimately means my job exists – but while a government funding gap exists, someone has to pay the magical digitisation fairies. [This doesn't mean I think it's right, but the situation is not going to be changed by an adversarial relationship between WMF and the cultural heritage sector, which is why this whole thing bothers me.  Lots of good work explaining the Commons models and encouraging access is being undone.]

You can't even argue that the NPG is getting increased exposure or branding through the use of their images, as there's a big question over whether images hosted on Wikimedia are being incorrectly given new attribution and rights statements.  Check the comment about the image on this blog post, and the Wikipedia statement from Wikimedia about the image and the original image page.  

To use a pub analogy, is Wikimedia the bad mate who shouts other people a round on your tab?