repositories – Open Objects

BBC to put 200,000 paintings from the Public Catalogue Foundation online

This could be fantastic – I hope the BBC will work with the museum sector to complement the work they're already doing or planning to get their collections online. From the Guardian, BBC to put nation's oil paintings online:

A partnership with the Public Catalogue Foundation charity will see all the UK's publicly owned oil paintings – 80% of which are not on public display – placed on the internet by 2012.

The BBC said it wanted to establish a new section of its bbc.co.uk website, called Your Paintings, where users could view and find information on the UK's national collection.

The Public Catalogue Foundation, launched in 2003, is 30% of the way through cataloguing the UK's collection of oil paintings.

In addition the BBC said it was talking to the Arts Council about giving the public free online access to its archive for the first time, including its wide-ranging film collection dating back to the 1950s.

…

[Mark Thompson, the BBC director general, said:] "Today we are not only reaffirming our commitment to arts, but we're announcing a series of measures that will put this relationship on an even stronger footing. Through innovative new partnerships, I believe the BBC can deliver big, bold arts programming that is accessible, distinctive and enjoyable."

I do wonder what Time Out's Tony Elliott would make of it.

'The strikethrough is the canonical symbol of the Web'

Below is a quote from Wired's Chris Anderson on museum, curatorial authority and the long tail, from a Washington Post report, 'Smithsonian Click-n-Drags Itself Forward' on Smithsonian 2.0 ('A Gathering to Re-Imagine the Smithsonian in the Digital Age').

The quote really covers two issues – making failures and mistakes in public and leaving them there, and training external volunteers and experts to curate parts of collections, because no one curator can be authoritative on everything in their remit: "in exchange for a slight diminution of the credentialed voice for a small number of things, you would get far more for a lot of things".

I suspect this is a false dichotomy – there's a place for both internal and external expertise. The Science Museum object wiki doesn't mean the rest of the collection catalogue and interpretation has no value or relevance. The challenge lies in presenting organisation and user-contributed content in the same interface – can those boundaries be removed? Is it wise to try? And what about taking external content back into the catalogue?

This isn't a new conversation for museum technologists, but it's a conversation I'd love to have with curators. I've never been sure how the technologists who get really excited by the possibilities of sharing content online in various ways can go about working with curators to find the best way of managing it so that the public, the collections and the curators benefit.

Anyway, onto Chris Anderson:

The discovery of the "long tail" principle has implications for museums because it means there is vast room at the bottom for everything. Which means, Anderson said, that curators need to get over themselves. Their influence will never be the same.

"The Web is messy, and in that messiness comes something new and interesting and really rich," he said. "The strikethrough is the canonical symbol of the Web. It says, 'We blew it, but we are leaving that mistake out there. We're not perfect, but we get better over time.' "

If you think that notion gives indigestion to an organization like the Smithsonian — full of people who have devoted much of their lifetimes to bringing near-perfect luster to some tiny pearl of truth — you would be correct.

The problem is, "the best curators of any given artifact do not work here, and you do not know them," Anderson told the Smithsonian thought leaders. "Not only that, but you can't find them. They can find you, but you can't find them. The only way to find them is to put stuff out there and let them reveal themselves as being an expert."

Take something like, oh, everything the Smithsonian's got on 1950s Cold War aircraft. Put it out there, Anderson suggested, and say, "If you know something about this, tell us." Focus on the those who sound like they have phenomenal expertise, and invest your time and effort into training these volunteers how to curate. "I'll bet that they would be thrilled, and that they would pay their own money to be given the privilege of seeing this stuff up close. It would be their responsibility to do a good job" in authenticating it and explaining it. "It would be the best free labor that you can imagine."

It didn't go down easily among the thought leaders, who have staked their lives' work on authoritativeness, on avoiding strikethroughs. What about the quality and strength of the knowledge we offer? asked one Smithsonian attendee.

You don't get it, Anderson suggested. "There aren't enough of you. Your skills cannot be invested in enough areas to give that quality."

It's like Wikipedia and the Encyclopedia Britannica, Anderson said. Some Wikipedia entries certainly are not as perfectly polished as the Britannica. But "most of the things I'm interested in are not in the Britannica. In exchange for a slight diminution of the credentialed voice for a small number of things, you would get far more for a lot of things. Something is better than nothing." And right now at the Smithsonian, what you get, he said, is "great" or "nothing."

"Is it our job to be smart and be the best? Or is it our job to share knowledge?" Anderson asked.

Global, not institutional, repositories FTW

In Some (more) thoughts on repositories, Andy Powell writes about academic repositories of research publications, but I think it's applicable to the cultural heritage sector too. Particularly when he writes on 'fit with the web':

Concentration
Global discipline-based repositories are more successful at attracting content than institutional repositories. … This is no surprise. It's exactly what I'd expect to see. Successful services on the Web tend to be globally concentrated (as that term is defined by Lorcan Dempsey) because social networks tend not to follow regional or organisational boundaries any more.

Web architecture
Take three guiding documents – the Web Architecture itself, REST, and the principles of linked data. Apply liberally to the content you have at hand – repository content in our case. Sit back and relax.

Resource discovery
On the Web, the discovery of textual material is based on full-text indexing and link analysis. In repositories, it is based on metadata and pre-Web forms of citation. One approach works, the other doesn't. (Hint: I no longer believe in metadata as it is currently used in repositories).

The museum sector has already created cross-institutional repositories (broadly defined, I don't care if it's a federated search or a big central pot of content), but are they understood and championed well enough? Are they maintained and integrated into on-going content creation and editing processes? Are their audiences encouraged to personalise and re-use the content?

Sadly also still relevant:

Across the board we are seeing a growing emphasis on the individual, on user-centricity and on personalisation (in its widest sense). … Yet in the repository space we still tend to focus most on institutional wants and needs. I've characterised this in the past in terms of us needing to acknowledge and play to the real-world social networks adopted by researchers. As long as our emphasis remains on the institution we are unlikely to bring much change to individual research practice.

Lots of people working in digital cultural heritage get it – but they're not necessarily the ones at the decision-making levels, and they're not necessarily in on projects from the start to help make the project design user-centred and the content (technically and semantically) interoperable.

FTW, by the way, stands for 'For The Win', defined by Wikipedia as 'Of something which completes a process in a successful manner'.

"The coolest thing to be done with your data will be thought of by someone else"

I discovered this ace quote, "the coolest thing to be done with your data will be thought of by someone else", on JISC's Common Repository Interfaces Group (CRIG) site, via the The Repository Challenge. The CRIG was created to "help identify problem spaces in the repository landscape and suggest innovative solutions. The CRIG consists of a core group of technical, policy and development staff with repository interface expertise. It encourages anyone to join who is dedicated and passionate about surfacing scholarly content on the web."

Read 'repository or federated search' for 'repository' (or think of a federated search as a pseudo-repository) and 'scholarly' for 'cultural heritage' content, and it sounds like an awful lot of fun.

It's also the sentiment behind the UK Government's Show Us a Better Way, the Mashed Museum days and a whole bunch of similar projects.

Fun with Freebase

A video of a presentation to the Freebase User Group with some good stuff on data mining, visualisation (and some bonus API action) via the Freebase blog.

If you haven't seen it before, Freebase is 'an open database of the world's information', 'free for anyone to query, contribute to, built applications on top of, or integrate into their websites'. Check out this sample entry on the early feminist (and Londoner) Mary Wollstonecraft. The Freebase blog is generally worth a look, whether you're interested in Freebase or just thinking about APIs and data mashups.

Another model for connecting repositories

Dr Klaus Werner has been working with Intelligent Cultural Resources Information Management (ICRIM) on connecting repositories or information silos from "different cultural heritage organizations – museums, superintendencies, environmental and architectural heritage organizations" to make "information resources accessible, searchable, re-usable and interchangeable via the internet".

You can read more on these CAA07 conference slides: ICRIM: Interconnectivity of information resources across a network of federated repositories (pdf download), and the abstract from the CAA07 paper might also provide some useful context:

The HyperRecord system, used by the Capitoline Museums (Rome) and the Bibliotheca Hertziana (Max-Planck Institute, Rome) and developed as Culture2000 project, is a framework for the inter-connectivity of information resources from museums, archives and cultural institutes.
…
The repositories offer both the usual human interface for research (fulltext, title, etc.) and a smart REST API with a powerful behind-the-scenes direct machine-to-machine facility for querying and retrieving data.
…
The different information resources use digital object identifiers in the form of URNs (up to now, mostly for museum objects) for identification and direct-access. These allow easy aggregation of contents (data, records, documents) not only inside a repository but also across boundaries using the REST API for serving XML over a plain HTTP connection, in fact creating a loosely coupled network of repositories.

Thanks to Leif Isaksen for putting Dr Werner in contact with me after he saw his paper at CAA07.

MultiMimsy database extractions and the possibilities for OAI-based collections repositories

I've uploaded my presentation slides from a talk for the UK MultiMimsy Users group in Docklands last month to MultiMimsy database extractions and the possibilities for OAI-based collections repositories at the Museum of London.

The first part discusses how to get from a set of data in a collections management system to a final published website, looking at the design process and technical considerations. Willoughby's use of Oracle on the back-end means that any ODBC-compliant database can query the underlying database and extract collections data.

The paper then looks at some of the possibilities for the Museum of London's OAI-PMH repository. We've implemented an OAI repository for the People's Network Discover Service (PNDS) for Exploring 20th Century London (which also means we're set to get records into Europeana), but I hope that we can use the repository in lots of other ways, including the possibility of using our repository to serve data for federated searches.

There's currently some discussion internationally in the cultural heritage sector about repositories vs federated search, but I'm not sure it's an either/or choice. The reasons each are used are often to do with political or funding factors instead of the base technology, but either method, or both, could be used internally or externally depending on the requirements of the project and institution.

I can go into more detail about the scripts we use to extract data from MultiMimsy or send sample scripts if people are interested. They might be a good way to get started if you haven't extracted data from MultiMimsy before but they won't generally be directly relevant to your data structres as the use of MultiMimsy can vary so widely between types of museums, collections and projects.

'The Vision of ORE': the scholarly graph

ORE (a specification for 'Object Reuse and Exchange') is one of those things I always mean to investigate but never quite find time to look into. This post, The Vision of ORE, makes a convincing case for investigating ORE sooner rather than later, as it "tries to map the true nature of contemporary scholarship onto the web" and "attempts to shift the focus from repositories for scholarship to the complex products of scholarship themselves".

This scholarship cannot be contained by web pages or PDFs put into an institutional repository, but rather consists of what the ORE team has termed “aggregates,” or constellations of digital objects that often span many different web servers and repositories. For instance, a contemporary astronomy article might consist of a final published PDF, its metadata (author, title, publication info, etc.), some internal images, and then—here’s the important part—datasets, telescope imagery, charts, several publicly available drafts, and other matter (often held by third parties) that does not end up in the PDF. Similarly, an article in art history might consist of the historian’s text, paintings that were consulted in a museum, low-resolution copies of those paintings that are available online (perhaps a set of photos on Flickr of the referenced paintings), citations to other works, and perhaps an associated slide show.

…

By forging semantic links between pieces entailed in a work of scholarship it keeps those links active and dynamic and allows for humans, as well as machines that wish to make connections, to easily find these related objects. It also allows for a much better preservation path for digital scholarship because repositories can use ORE to get the entirety of a work and its associated constellation rather than grabbing just a single published instantiation of the work.

The implementation of ORE is perhaps less commonsensical for those who do not wish to dive into lots of semantic web terms and markup languages, but put simply, the approach the ORE group has taken is to provide a permanent locator (i.e., a URI, like a web address) that links to what they call a “resource map,” which in turn describes an aggregation.
…
There has been much talk recently of the social graph, the network of human connections that sites like Facebook bring to light and take advantage of. If widely adopted, ORE could help create the scholarly graph, the networked relations of scholars, publications, and resources.

edna on the benefits of metadata repositories for educational resources

Some random doodling about the possibilities of using the functionalities of an OAI repository as an API lead me to information about metadata repositories, harvesting and edna (Education Network Australia, ' Australia's free online network for educators').

In linked document, Harvesting Overview, they state: "In addition, the edna search API is embedded into numerous other websites – providing access to the edna repository and indexes from external websites. The benefit to you is that, by providing your metadata records for harvesting by edna, you increase exposure to your valuable education and training related resources."

It's a good summary of the processes involved in setting up an OAI repository for harvesting and of the benefits for the organisation; including increased visibility of resources, maximising return on investment [ROI] for created resources and associated metadata and benefiting from services such as RSS that can be delivered back to the organisation.

Linking DSpace and OpenSearch?

Has anyone hooked up Opensearch and a DSpace repository?

We're just about to start using a DSpace repository for collections data – object metadata, media files and metadata and information record (people, places, events, publications) metadata – for selected records from our Mimsy XG collections management system; and I think an OpenSearch service would make the data a lot more findable and possibly a lot more useable.

I really should write it up properly at some stage, but I'm hoping that our repository will have a use beyond providing an OAI-PMH-compliant data source for partnership projects and our own internal requirements.

For example, other people may query the repository to build applications with our data; or use it as a central index of all the records we've published in digital projects over the years, following links to sites in which the object appears. Or it might enable us to try some semantic web-ish things…

I'd be curious to hear about anyone's experience with DSpace/OAI-PMH or OpenSearch for museum collection data, but I'd particularly love to hear from you if you've used them together.