Slides and talk from 'Cosmic Collections' paper

This is a lazy post, a straight copy and paste of my presentation notes (my excuse is that I'm eight days behind on everything at work and uni after being grounded in the US by volcanic ash). Anyway, I hope you enjoy it or that it's useful in some way.

Cosmic Collections: creating a big bang?

View more presentations from Mia .

Slide 1 (solar rays – Cosmic Collections):

The Cosmic Collections project was based on a simple idea – what if we gave people the ability to make their own collection website? The Science Museum was planning an exhibition on astronomy and culture, to be called ‘Cosmos & Culture’. We had limited time and resources to produce a site to support the exhibition and we risked creating ‘just another exhibition microsite’. So what if we provided access to the machine-readable exhibition content that was already being gathered internally, and threw it open to the public to make websites with it?  And what if we motivated them to enter by offering competition prizes?  Competition participants could win a prize and kudos, and museum audiences might get a much more interesting, innovative site.
The idea was a good match for museum mission, exhibition content, technical context, hopefully audience – but was that enough?
Slide 2 (satellite dish):
Questions…
If we built an API, would anyone use it?
Can you really crowdsource the creation of collections interfaces?
The project gave me a chance to investigate some specific questions.  At the time, there were lots of calls from some quarters for museums to produce APIs for each project, but would anyone actually use a museum API?  The competition might help us understand whether or how we should invest in APIs and machine-readable data.
We can never build interfaces to meet the needs of every type of audience.  One of the promises of machine-readable data is that anyone can make something with your data, allowing people with particular needs to create something that supports their own requirements or combines their data with ours – but would anyone actually do it?
Slide 3 (map mashup):
Mashups combine data from one or more sources and/or data and visualisation tools such as maps or timelines.
I'm going to get the geek stuff out of the way and quickly define mashups and APIs…
Mashups are computer applications that take existing information from known sources and present it to the viewer in a new way. Here’s a mashup of content edits from Wikipedia with a map showing the location of the edit.
Slide 4 (APIs)
APIs (Application Programming Interfaces) are a way for one machine to talk to another: ‘Hi Bob, I’d like a list of objects from you, and hey, Alice, could you draw me a timeline to put the objects on?’
APIs tell a computer, 'if you go here, you will get that information, presented like this, and you can do that with it'.
A way of providing re-usable content to the public, other museums and other departments within our museum – we created a shared backend for web and gallery interactives.
I think of APIs as user interfaces for developers and wanted to design a good experience for developers with the same care you would for end users*.  I hoped that feedback from the competition could be used to improve the beta API
* we didn’t succeed in the first go but it’s something to aim for post-beta
Slide 5: (what if nobody came?)
AKA 'the fears and how to deal with them'
Acknowledge those fears
Plan for the worst case scenario
Take a deep breath and do it anyway
And on the next slides, the results.  If I was replicating the real experience, you’d have several nerve-biting months while you waited for the museum to lumber into gear, planned the launch event, publicised the project in the participant communities… Then waited for results to come in. But let’s skip that bit…
Slide 6: (Ryan Ludwig's http://www.serostar.com/cosmic/)
The results – our judges declared a winner and a runner-up, these are screenshots – this is the second prize winning entry.
People came to the party. Yay! I'd like to thank all the participants, whether they submitted a final entry or not. It wouldn't have worked without them.
Slide 7: (Natalie and Simon's http://cosmos.natimon.com/)
This is a screenshot from the winning site – it made the best use of the API and was designed to lure the visitor in and keep drawing them through the site.
(We didn’t get subject specialists scratching their own itch – maybe they don’t need to share their work, maybe we didn’t reach them. Would like to reach researchers, let them know we have resources to be used, also that they can help us/our audiences by sharing their work)
Slide 8: (astrolabe – what did we learn?)
People need (more) help to participate in a geektastic project like this
The dynamics of a competition are tricky
Mashups are shaped by the data provided – you get out what you put in
Can we help people bring their own content to a future mashup?
Slide 9: (evaluation)
I did a small survey to evaluate the project… Turns out the project was excellent outreach into the developer community. People were really excited about being invited to play with our data.  My favourite quote: "The very idea of the competition was awesome"
Slide 10: (paper sheet)
Also positive coverage in technical press. So in conclusion?
Slide 11: (Tim Berners-Lee):
“The thing people are amazed about with the web is that, when you put something online, you don’t know who is going to use it—but it does get used.”
There are a lot of opportunities and excitement around putting machine-readable data online…
Slide 12: Tim Berners-Lee 2:
But:  It doesn’t happen automatically; It’s not a magic bullet
But people won't find and use your APIs without some encouragement. You need to support your API users. People outside the museum bring new ideas but there's still a big role for people who really understand the data and audiences to help make it a quality experience…
Slide 13 (space):
What next?
Using the feedback to focus and improve collection-wide API
Adding other forms of machine-readable data
Connecting with data from your collections?
I've been thinking about how to improve APIs – offer subject authorities with links to collections, embed markup in the collections pages to help search engines understand our data…
I want more! The more of us with machine-readable data available for re-use, the better the cross-collections searches, the region or specialism-wide mashups… I'd love to be able to put together a mashup showing all the cultural heritage content about my suburb; all the Boucher self-portraits; all the inventions that helped make the Space Shuttle work…
Slide 14: (thank you)
If you're interested in possibilities of machine-readable data and access to your collections, join in the conversation on the museum API wiki or follow along on twitter or on blogs.  Join in at http://museum-api.pbworks.com/
More at https://openobjects.org.uk/ or @mia_out

Image credits include:
http://antwrp.gsfc.nasa.gov/apod/ap100415.html
http://antwrp.gsfc.nasa.gov/apod/ap100414.html
http://antwrp.gsfc.nasa.gov/apod/ap100409.html
http://antwrp.gsfc.nasa.gov/apod/ap100209.html
http://antwrp.gsfc.nasa.gov/apod/ap100315.html
http://www.sciencemuseum.org.uk/Centenary/Home/Icons/Pilot_ACE_Computer.aspx
http://www.prospectmagazine.co.uk/2010/01/mash-the-state/

MW2010 machine-readable data unconference session

This was originally posted on the 'Museums and the machine-processable web' wiki.

This is a rough report from an unconference session on RDFa, microformats and museum data held during Museums and the Web 2010.

I'm writing it up later than I intended (blame the volcano) so please excuse any mistakes in writing up, misattributions, etc – you can sign in to edit them yourself, leave a comment or drop me a line (contact details on the register your interest page).

I'm also writing it up just before I head to the airport, so this first version won't be complete so do jump in and add your own notes if you were there (or wanted to be).

We started by introducing ourselves and briefly describing our interest in the session.

Those present were: Richard Urban, Nate Solas, Paul Hagon, Peter Goodall, Bart Grob (?), Ilya, Piotr Adamczyk, Richard Morgan, Paul Rowe, Darren Scott, Erich Schroeder, Patrick Schmitz, Gunter Waibel…

Interests included: included inference rules based on metadata, embedding metadata in webpages, breaking through the 'analysis paralysis' and choosing a standard to implement (even if it wasn't perfect),

What problems are people having? Picking a standard!

What issues arose during the unconference?

There was an interesting tension between the 'just do it, near enough is good enough' and the 'let's wait until we've got the standard right' impulses – as museum technologists I guess many of us are a mixture of both. But there was also a feeling that we should find a way to move beyond the questions to the point where we start implementing something, with an eye to having a demonstrator project available by this time next year (so April 2011).

We made a useful distinction between a lightweight shared 'standard' that aimed to increase the discoverability of content, and more heavyweight standards that might be used internally or implemented with particular uses in mind. This distinction allows us to keep working through the issues to come up with a suitable (usable, robust, sustainable, implementable, accurate) long-term solution while trying out existing or ad hoc standards in the shorter term.

The voices of reason

One of the reasons I was so happy with this unconference session is that all kinds of people contributed commonsense warnings from their various domains and experiences.  Piotr and Richard said they were still looking for the things that could be done in RDFa that couldn't be done with existing infrastructure.

The use cases

Providing use cases helps everyone understand what we each want to do with the data as well as what we have in our collections.

Peter Goodall wants to make it easy for museums to do mashup collections.

Piotr is still looking for what can be done in RDFa that can't be done with existing infrastructure…

Ilya – neighbourhood project – Open Source Software Foundary – implemented RDFa as a demonstration – FOAF is format to describe social networks and DOPE – description of a project. What kind of aggregation service could we endorse to harvest from our collections?

One of mine: Caroline Herschel (1750 1848) is an astronomer, and there's content about her in lots of museums across the world. I've encountered her in Brooklyn Museum, the National Maritime Museum, the National Portrait Gallery… I'd love to link to images and content from all those other museums from our page about her – but how would I find that content, and how could I reliably link to it?

Erich from Illinois state museum – was working on oral historyproject  on agriculture, indexed to really detailed level – wants to provide user with a proper citation for an interview clip. Found zotero but only got as far as that.

Gunter: OAI-PMH and CDWA-Lite on last project; writing tips for museums working on stuff like this.

FOAF? Richard, V&A – just done collections online with an API that wasn't really standards-based.  Is with Piotr – we should just be able to do this stuff with NLP and text mining – also interested in FOAF.  FOAF sounds like a winner as we know there are people out there lookig for people's names. 

Peter Goodall – large db of people to disambiguate names.  Paul – playing with FOAF – someone made a FOAF generator from their API.  Paul Rowe – NZ museums project – looking at terminologies and overlaps.

Or maybe not FOAF… Patrick from CollectionSpace and UC Berkeley – in past life has done lots of semantic work but has reservations about RDFa. Worries about vocabs e.g. Dublin Core that turns out to be irreconcilable but once embedded make it hard to do more serious things. Interested in reasoning and inferencing across collections.  Ontologies are a point of view, doesn't believe can have a universal point of view.  Use NLP (natural language processing) to index collections from a given community. Interesting to explore more specifically the use cases e.g. compelling cases around events. FOAF doesn't let you model different types of relationships and roles that one person may fulfil. e.g. of how it's hard to shift a community to something more refined once a model is in place.  Potential to generate multiple points of view with different vocabs, use cases will help him understand.

What next? AKA, getting on with it

Testing standards – I'm really up for implementing something on our existing pages – I was thinking that a comparison of two different standards, both marked up as RDFa on existing Science Museum/NMSI web pages (Dublin Core on Ingenious and LIDO on Making the Modern World) , would help provide some useful data on the utility of the approach and the beginning of a comparison between standards.  I've written about it a bit at http://museum-api.pbworks.com/Science-Museum-linked-data – it's a very unfinished document but if you've got suggestions how making it better I'd love to hear them.

[My notes get sketchy from here on it because I'm returning to them after a few months, and some use cases may have ended up in this section, but that's probably ok]

It was suggested that versioning could be a way of dealing with the fact that we don't have a perfect standard right now – it could allow us to iterate through various prototypes and demonstrators until we get something good, while not breaking projects that are built in the meantime.

Microformats – Paul Hagon has used them on event (and other stuff?), Nate pointed out that they're used by Google and Yahoo.

Richard – maybe work on a new 'do one thing' challenge.

Dublin Core is 'messy'.  Patrick: 'is a little better than tagging'.
Peter – interested in using really dumb taxa cos people catalogue inconsistently anyway.
Patrick – taxa even in life sciences don't agree.
Something that's good enough vs something perfect.
Map to shared system with mapping to the authorities used to back things up.
PS: instead of describing a free concept, e.g. a pig, but 'a pig' and when we say pig, we mean it as in this name authority.
GW: identifier-based systems.
How much do we aim for perfection?
PS: don't tie yourself to a syntax that doesn't allow for that.
NS: What can we solve today?
PS: don't want to say figure everything out before you start but consider later options.
NS: let's do something lightweight – add RDFa to marked up pages.
Peter G: interested in something really simple… really interesting thing is the objects – being able to refer to the identity of an object from a pictorial represntation.

LIDO as vocab that works for social history museums and not just art galleries; Dublin Core as quick win.


NS: if we provide enough good enough markup… PA: satisficing approach.
WordNet as term, authority list.
Grappling with issues around how lightweight/heavyweight to go that allows useful exchange of records/assertions.
PS: can I pivot across museums based on some RDFa tags?

[So as you can see, there were no solid conclusions and we didn't leave with an agreement "let's all try implementing x".  I still like the idea of an MW2010 challenge, ideally something you can participate in as a publisher or consumer of data… Suggestions?]

Some thoughts on linked data at the Science Museum – thoughts in progress

I originally posted this on the Science Museum developers blog.

I’ve posted on twitter and my personal blog but forgot to post over here (tsk) – I’ve written some very-much-in-progress thoughts on how the Science Museum could work with linked data/APIs to improve our machine-readable data offerings at the museum data wiki.

I’m particularly interested in finding the balance between a solution we can achieve in the medium-term and something that works with standards as much as possible.

It’s nearly time for the Museums and the Web 2010 conference, where questions like this might be addressed in one of the unconference sessions so I’d love to hear your thoughts.

'Cosmic Collections' – my MW2010 paper online

My Museums and the Web 2010 paper is up at Cosmic Collections: Creating a Big Bang and I'm working on the slides now and I'm curious – what would you like to see more of in a presentation?  It's only short (6 minutes) so I'm currently thinking setup (including lots of definitions for non-geeks), outcomes (did the project succeed?), and a bit on what I think the next steps are (basically a call to get your data online in re-usable formats).

I'm thinking of leading with this Tim Berners-Lee quote from an article in Prospect, Mash the state:

"The thing people are amazed about with the web is that, when you put something online, you don't know who is going to use it—but it does get used."