Geek for a week: residency at the Powerhouse Museum

I've spent the last week as 'geek-in-residence' with the Digital, Social and Emerging Technologies team at the Powerhouse Museum. I wasn't sure what 'geek-in-residence' would mean in reality, but in this case it turned out to be a week of creativity, interesting constraints and rapid, iterative design.

When I arrived on Monday morning, I had no idea what I'd be working on, let alone how it would all work. By the end of the first day I knew how I'd be working, but not exactly what I'd focus on. I came in with fresh questions on Tuesday, and was sketching ideas by lunchtime. The next few days were spent getting stuck into wireframes to focus in on specific issues within that problem space; I turned initial ideas into wireframes and basic copy; and put that through two rounds of quick-and-dirty testing with members of the public and Powerhouse volunteers. By the time I left on Friday I was able to handover wireframes for a site called 'conversations about collections' which aims to record people's memories of items from the collection. (I ran out of time to document the technical aspects of how the site could be built in WordPress, but given the skills of the team I think they'll cope.)

The first day and a half were about finding the right-sized problem. In conversations with Paula (Manager of the Visual & Digitisation services team) and Luke (Web Manager), we discussed what each of us were interested in exploring, looking for the intersection between what was possible in the time and with the material to hand.

After those first conversations, I went back to Powerhouse's strategy document for inspiration. If in doubt, go back to the mission! I was looking for a tie-in with their goals – luckily their plan made it easy to see where things might fit. Their strategy talked about ideas and technology that have changed our world and stories of people who create and inspire them, about being open to 'rich engagement, to new conversations about the collections'.

I also considered what could be supported by the existing API, what kinds of activities worked well with their collections and what could be usefully built and tested as paper or on-screen prototypes. Like many large collections, most of the objects lack the types of data that supports deeper engagement for non-experts (though the significance statements that exist are lovely).

Two threads emerged from the conversations: bringing social media conversations and activity back into the online collections interfaces to help provide an information scent for users of the site; and crowdsourcing games based around enhancing the collections data.
The first was an approach to the difficulties in surfacing the interesting objects in very large collections. Could you create a 'heat map' based on online activity about objects to help searchers and browsers spot objects that might be more interesting?

At one point Nico (Senior Producer) and I had a look at Google Analytics to see what social media sites were sending traffic to the collections and suss out how much data could be gleaned. Collection objects are already showing up on Pinterest, and I had wild thoughts about screen-scraping Pinterest (they have no API) to display related boards on the OPAC search results or object pages…

I also thought about building a crowdsourcing game that would use expert knowledge to data to make better games possible for the general public – this would be an interesting challenge, as open-ended activities are harder to score automatically so you need to design meaningful rewards and ensure an audience to help provide them. However, it was probably a bigger task than I had time for, especially with most of the team already busy on other tasks, though I've been interested in that kind of dual-phased project since my MSc project on crowdsourcing games for museums.

But in the end, I went back to two questions: what information is needed about the collections, what's the best way to get it? We decided to focus on conversations, stories and clues about objects in the collections with a site aimed at collecting 'living memories' about objects by asking people what they remember about an object and how they'd explain it to a kid. The name, 'Conversations about collections' came directly from the strategy doc and was just too neat a description to pass up, though 'memory bank' was another contender.
I ended up with five wireframes (clickable PDF at that link) to cover the main tasks of the site: to persuade people (particularly older people) that their memories are worth sharing, and to get the right object in front of the right person. Explaining more about the designs would be a whole other blog post, but in the interests of getting this post out I'll save that for another day… I'm dashing out this post before I head out, but I'll update in response to questions (and generally things out when I have more time).

My week at the Powerhouse was a brilliant chance to think through the differences between history of science/social history objects and art objects, and between history and art museums, but that's for another post (perhaps when if I ever get around to posting my notes from the MCN session on a similar topic).

It also helped me reflect on my interests, which I would summarise as 'meaningful audience participation' – activities that are engaging and meaningful for the audience and also add value for the museum, activities that actually change the museum in some way (hopefully for the better!), whether that's through crowdsourcing, co-curation or other types of engagement.

Finally, I owe particular thanks to Paula Bray and Luke Dearnley for running with Seb Chan's original suggestion and for their time and contributions to shaping the project; to Nicolaas Earnshaw for wireframe work and Suse Cairns for going out testing on the gallery floor with me; and to Dan Collins, Estee Wah, Geoff Barker and everyone else in the office and on various tours for welcoming me into their space and their conversations.

Photo: behind the scenes at the (then) Powerhouse Museum, Sydney

Report from 'What's the point of a museum website' at MCN2011

A really belated report from the 'What's the point of a museum website?' panel I was part of with Koven Smith (@5easypieces), Eric Johnson (@ericdmj), Nate Solas (@homebrewer) and Suse Cairns (@shineslike) at last November's Museum Computer Network (MCN2011) conference. I've written up some of my own thoughts at Brochureware, aggregators and the messy middle: what's the point of a museum website? – this post is about the discussion during the panel itself. There was a lot of audience participation (in the room and on twitter), which made tackling a summary of the discussion really daunting, so I've given up on trying to capture every thread of conversation and am just reporting from the notes I took at the time.

It's all bit of a blur now so it's hard to remember exactly how the conversations went, but from my notes at the time, it included: Clay Shirky on social objects as a platform for conversation; games and other online experiences as big draws for museum sites (trusted content is a boon for parents); the impact of social media making the conversations people have always had about exhibitions and objects visible to curators and others; and the charisma of the physical object. From the audience Robin White Owen mentioned the potential for mobile apps to create space, opportunity for absorption and intimate experiences with museum content, leading me to wonder if you can have a Stendhal moment online?

Is discoverability is the new authority for museum websites? As Nate said, authority online lies in being active online, though we also need to differentiate between authority about objects and narratives, and cite our sources for statements about online collections. (See also Rob Stein on the difference between being authoritarian and authoritative). But maybe that's challenging too – perhaps museums aren't good at saying there is no right answer because we like to be the one with the right answer. Someone mentioned 'communities of passion' gathered around specific objects, which is a lovely phrase and I'm sorry I can't remember who said it. Someone else from the audience wisely said, it's 'not how do I drive people to my collection, but how do I drive my collection to them'. Andrew Lewis talked about 'that inspiration moment' triggered in a museum that sends you hurrying back home to make art or craft something.

I talked about my dream of building a site that people would lose themselves in for hours, just as you can do on Wikipedia now after starting with one small query. How can we build a collections online site where people can follow one interesting-looking object or story after another? We can't do that without a critical mass of content, and I suspect this can only be created by bringing different museum collections together digitally (or as Koven called it, digital repatriation), which also gets around the random accidents of collecting history that mean related objects are isolated in museums and galleries around the world. Also, we're only ever part of the audience's session online – we might be the start, or the end, but we're more likely to be somewhere in the middle. We should be good team players and use our expert knowledge to help people find the best information they can.

Looking back, a lot of the conversation appears to be about how to create the type of rich experience of being in the presence of an object – a moment in time as well as in space – from the currently flat experience of looking at an object in an online catalogue (particularly when the online environment has all the distractions of kitten videos and social media notifications). Can storytelling or bite-sized bits of content about objects act as 'hooks' to enable reflection and learning online? Hugh Wallace has used the phrase 'snackable content' for readily available content that fits into how people use technology, and I think (with my conversational, social history bias) that stories-as-anecdotes can be a great way of sharing information about collections while creating that self-contained moment in time. (And yes, I am side-stepping Walter Benjamin's statement that 'that which withers in the age of mechanical reproduction is the aura of the work of art'. Not that he was in the room, but he does tend to haunt these conversations.)

As with many conversations about online visitors, the gap between what we know and what we should know is frustratingly large, and we still don't know how large the gap between what (particularly) collections online are and what they could be. Someone said that we're (measuring, or talking about) what users currently do with what we give them, not what they really want to do. Bruce Wyman tweeted, 'current visitors most frequently give *incremental* ideas. You need different folk to take those great leaps forward. That's us'. Rob Stein said he didn't care about measuring time online, but wanted to be able to measure epiphanies – an excellently provocative statement that generated lots of discussion, including comments that epiphany needs agency, discourse, and serendipity. Eric said we murder epiphany by providing too much information, but others pointed out that epiphanies are closely tied to learning, so maybe it's a matter of the right information at the right time for the right person and a good dose of luck.

So (IMO) it was a great panel session, but did we come up with an answer for 'what's the point of a museum website'? Probably not, but it's clearly a discussion worth having, and I dare say there were a few personal epiphanies during the session.

I'm collecting other posts about the session and will update this as I find them (or let me know of them in the comments): Suse's Initial takeaways from MCN2011. I also collated some of the tweets that used the session hashtag 'wpmw' in a document available (for now) via my dropbox.

Finally, thank you to everyone who attended or followed via twitter, and particular thanks to my fellow panelists for a great discussion.

Documentation for collections data from Science Museum, National Media Museum, National Railway Museum (NMSI) released as CSV

I originally posted this on the Science Museum API documentation wiki.

About this data

These data sets contain information about objects from the collections of the Science Museum, the National Media Museum and the National Railway Museum. These datasets include many items not on display in our galleries, as well as authority records about related people and organisations, events and image files.

The collections include objects relating to aeronautics, agriculture, astronomy, cinematography, medicine, materials, space, television, time measurement, transport and more. They range in size from contact lenses to Concorde 002.

We've published three data sets:

218,822 object records (currently in 4 files, each up to 15mb) (NMSI_object1_20110304.csv, NMSI_object2_20110304.csv, NMSI_object3_20110304.csv, NMSI_object4_20110304.csv)
40,596 media records (metadata about images already published online) (NMSI_media_20110304.csv)
173 event records (NMSI_events_20110304.csv)

We hope to publish our lists of c9000 people and organisations related to these objects soon, alongside a table linking objects to events.

The data is supplied in CSV (comma-separated format, exported from Excel). The first line of each file contains the field headings. Files may be up to 15mb in size.

The data is released under the Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) licence (http://creativecommons.org/licenses/by-nc-sa/3.0/). Please contact us if you would like to use this data under different conditions.

Why we're releasing the data

We have been providing access to a searchable database of our collections online at http://collectionsonline.nmsi.ac.uk/ for some time now, but through staff attendance at various hack days, we've learned that this interface does not support programmatic search or exploration of the data. We've also learned (through the Cosmos & Culture project) that a number of people found the XML provided by the default .Net service that published the API too complex. CSV is a very simple format, accessible to a wider range of people. We hope that it will be usable by most people.

We're publishing the data in CSV format now as a relatively lightweight experiment. We'd like to understand whether, and if so, how, people would use our data. We'd also like to explore the benefits for the museum and for programmers using our data – your feedback would inform decisions about future investment in more structured data as well as helping shape our understanding of the requirements of those users.

We hope you will be creative with it, but please use it responsibly. If you're not sure whether the museum would be comfortable with your idea, please drop us a line to discuss it.

How you can help

You can help us to improve this resource – let us know if you have any information about our objects, or if you find any errors, though we will probably not republish this data set in the short-term. Please quote the Object Number/s and email: Collections.Online@nmsi.ac.uk

We'd like this experiment to help us understand the needs of potential users but we can only do that with your help – we'd love to hear your comments on how you've used the data, and how we could improve it. If possible, we'd like to feature mashups or other applications made with our data. Please email us at web.team@nmsi.ac.uk, send @sciencemuseum a message on twitter or leave a comment at http://sciencemuseumdiscovery.com/blogs/museumdev.

Objects

NMSI_object1_20110304.csv, NMSI_object2_20110304.csv, NMSI_object3_20110304.csv, NMSI_object4_20110304.csv.

Column title	What is it?
ID_NUMBER	The unique identifier for a record, based on the museum's own accession number. The number may refer to a single object or (historically) to a collection of objects.
ITEM_NAME	Object name – a simple name or common name. Where possible this is from an established thesaurus (i.e. http://museum-api.pbworks.com/f/NMSI_draft200903_object_name.csv)
TITLE	A short one-line caption or brief description of the object, derived from the existing data. The title should be a summary capturing the essence of an object. Often includes related place and date.
MAKER	The name of the person or company or other organisation that made the object. The Maker field is indexed and linked to the People/Organisation records (to be released shortly) – links should be made by matching strings (internal IDs are not available).
DATE_MADE	The date when an object was made (production date). Dates should be recorded consistently and ranges should be in the format <earlier year>-<later year> e.g. 1671-1700. Approximate dates are written as e.g. c. 1936. This field also contains various strings, including ‘Unknown'.
PLACE_MADE	Place names are indexed in the database and linked into a hierarchy (Getty Thesaurus of Geographic Names with in-house modifications i.e. http://museum-api.pbworks.com/f/NMSI_draft200903_place.csv) and should be recorded consistently because they are derived from a term list. Where known with certainty or reasonable probability the town or city of production is recorded. As a minimum the nation/country of origin or the probable nation/country of production should be recorded. If there is some uncertainty this can be explained in the general description.
MATERIALS	Records what the object is made of and what part of the object is made of that material.
MEASUREMENTS	Record the type of measurements that are most useful for an object, with ‘overall' being the most usual dimensions recorded. Overall will be the amount of space the object takes up when it first arrives in the museum and is stored. Measurements must be recorded consistently in metric units. Compulsory measurements are Size and Weight. The default units of measurement are millimetres and kilograms. Example: overall: 51 mm x 95 mm x 80 mm, 0.371kg,
DESCRIPTION	In this field we try to describe what the what, when, why, where, who information about the object, what it is, what it does, is made of, who made it, where was it made and what makes it unique. This field should be exported as plain text (without markup). The information here is used by the museum to audit an object so it should be described well with each part defined. It should also contain all the information about the object so that an interpreted description can be written (suitable for publication). Technical terms have been avoided as far as possible. Names, dates, places and significant events should be recorded here in a normalized form but will also be recorded in other indexed fields. As far as possible the following are recorded: <number of objects> <name of object, qualifier> <model name, number> <what is the type of object?> <specific information>:<made by…> <type of object> <place made> <date made> <any associated relevant fact> <materials> <colour><serial number><containers> <accessories> <dimensions> <condition and completeness> <identification of parts> <acquisition/provenance information> <story of display, conservation etc.> <other details>
WHOLE_PART	Mostly an internal field.
COLLECTION	A broad subject specialism applied during the Acquisition/ Entry process. NMeM National Media Museum NRM National Railway Museum SCM Science Museum. Collection terms are listed at http://museum-api.pbworks.com/w/page/36515349/NMSI-Collections-list

For more information on authority records, see http://en.wikipedia.org/wiki/Authority_control

Media

NMSI_media_20110304.csv

This table contains information relating object records to images already published online at http://collectionsonline.nmsi.ac.uk/.

You can use it to construct URLs to images of the objects. (The images are hosted on a site built with a third-party solution so the URLs aren't ideal.)

objects.ID_NUMBER is the equivalent to media. OBJECT, giving you a link between the object and media tables (e.g. 1999-719). The media. MEDIAKEY (e.g. 125972) can then be included in a URL, e.g. the image file URL uses the media key: http://collectionsonline.nmsi.ac.uk/grabimg.php?wm=1&kv=125972

Column title	What is it?
MEDIA_ID	e.g. 10327065.jpg
OBJECT	The object ID_NUMBER e.g. 1999-719
MEDIAKEY	e.g. 125972
CAPTION	Optional. E.g. ‘Class 84 locomotive at Barrow Hill, sanding and filling in progress, August 1984'

Events

NMSI_events_20110304.csv

Currently this data set has fairly random coverage but we would be interested to see whether people find the content useful. If the object was linked to any significant event (historical, political, developmental or other milestone events) or if an object featured at some significant and well-known event or activity, it might be recorded in this table.

Column title	What is it?
Event Name	Includes location and date/date range.
Event Short Name	Event title without location or date (usually)
Event Category	Values include era, war, exhibition, expedition (term list?)
Occurrence Type	E.g. one-time, periodic, annual. Optional
Event Start Date	Single date as year or y/m/d. Mixed formats (sorry!). Also includes BCE dates expressed as negative integers e.g. -3100 Optional
Event End Date	As for Event Start Date. Optional
Display Date	?
Duration	Integer – use with Duration Unit. Optional
Duration Unit	E.g. days, months, years. Use with Duration. Optional
Event Description	Text. Optional
Description Source(s)	May be a URL. Optional
Sort Name	Internal use version of event name

Produced for the Science Museum, London. Last updated by Mia Ridge, March 2011. With thanks to the web, database and documentation teams at NMSI for their support and assistance. Thanks also to @rboulton for testing the documentation.

Documentation for collections data from Science Museum, National Media Museum, National Railway Museum (NMSI) released as CSV

I originally posted this on the Science Museum API wiki. This version dates to March 2011, as I documented things before leaving to do a PhD.

Documentation for collections data from Science Museum, National Media Museum, National Railway Museum (NMSI) released as CSV

About this data

We've published three data sets:

218,822 object records (currently in 4 files, each up to 15mb) (NMSI_object1_20110304.csv, NMSI_object2_20110304.csv, NMSI_object3_20110304.csv, NMSI_object4_20110304.csv)
40,596 media records (metadata about images already published online) (NMSI_media_20110304.csv)
173 event records (NMSI_events_20110304.csv)

We hope to publish our lists of c9000 people and organisations related to these objects soon, alongside a table linking objects to events.

The data is supplied in CSV (comma-separated format, exported from Excel). The first line of each file contains the field headings. Files may be up to 15mb in size.

Why we're releasing the data

We hope you will be creative with it, but please use it responsibly. If you're not sure whether the museum would be comfortable with your idea, please drop us a line to discuss it.

How you can help

Objects

NMSI_object1_20110304.csv, NMSI_object2_20110304.csv, NMSI_object3_20110304.csv, NMSI_object4_20110304.csv.

Column title	What is it?
ID_NUMBER	The unique identifier for a record, based on the museum's own accession number. The number may refer to a single object or (historically) to a collection of objects.
ITEM_NAME	Object name – a simple name or common name. Where possible this is from an established thesaurus (i.e. http://museum-api.pbworks.com/f/NMSI_draft200903_object_name.csv)
TITLE	A short one-line caption or brief description of the object, derived from the existing data. The title should be a summary capturing the essence of an object. Often includes related place and date.
MAKER	The name of the person or company or other organisation that made the object. The Maker field is indexed and linked to the People/Organisation records (to be released shortly) – links should be made by matching strings (internal IDs are not available).
DATE_MADE	The date when an object was made (production date). Dates should be recorded consistently and ranges should be in the format <earlier year>-<later year> e.g. 1671-1700. Approximate dates are written as e.g. c. 1936. This field also contains various strings, including ‘Unknown'.
PLACE_MADE	Place names are indexed in the database and linked into a hierarchy (Getty Thesaurus of Geographic Names with in-house modifications i.e. http://museum-api.pbworks.com/f/NMSI_draft200903_place.csv) and should be recorded consistently because they are derived from a term list. Where known with certainty or reasonable probability the town or city of production is recorded. As a minimum the nation/country of origin or the probable nation/country of production should be recorded. If there is some uncertainty this can be explained in the general description.
MATERIALS	Records what the object is made of and what part of the object is made of that material.
MEASUREMENTS	Record the type of measurements that are most useful for an object, with ‘overall' being the most usual dimensions recorded. Overall will be the amount of space the object takes up when it first arrives in the museum and is stored. Measurements must be recorded consistently in metric units. Compulsory measurements are Size and Weight. The default units of measurement are millimetres and kilograms. Example: overall: 51 mm x 95 mm x 80 mm, 0.371kg,
DESCRIPTION	In this field we try to describe what the what, when, why, where, who information about the object, what it is, what it does, is made of, who made it, where was it made and what makes it unique. This field should be exported as plain text (without markup). The information here is used by the museum to audit an object so it should be described well with each part defined. It should also contain all the information about the object so that an interpreted description can be written (suitable for publication). Technical terms have been avoided as far as possible. Names, dates, places and significant events should be recorded here in a normalized form but will also be recorded in other indexed fields. As far as possible the following are recorded: <number of objects> <name of object, qualifier> <model name, number> <what is the type of object?> <specific information>:<made by…> <type of object> <place made> <date made> <any associated relevant fact> <materials> <colour><serial number><containers> <accessories> <dimensions> <condition and completeness> <identification of parts> <acquisition/provenance information> <story of display, conservation etc.> <other details>
WHOLE_PART	Mostly an internal field.
COLLECTION	A broad subject specialism applied during the Acquisition/ Entry process. NMeM National Media Museum NRM National Railway Museum SCM Science Museum. Collection terms are listed at http://museum-api.pbworks.com/w/page/36515349/NMSI-Collections-list

For more information on authority records, see http://en.wikipedia.org/wiki/Authority_control

Media

NMSI_media_20110304.csv

This table contains information relating object records to images already published online at http://collectionsonline.nmsi.ac.uk/.

You can use it to construct URLs to images of the objects. (The images are hosted on a site built with a third-party solution so the URLs aren't ideal.)

Column title	What is it?
MEDIA_ID	e.g. 10327065.jpg
OBJECT	The object ID_NUMBER e.g. 1999-719
MEDIAKEY	e.g. 125972
CAPTION	Optional. E.g. ‘Class 84 locomotive at Barrow Hill, sanding and filling in progress, August 1984'

Events

NMSI_events_20110304.csv

Column title	What is it?
Event Name	Includes location and date/date range.
Event Short Name	Event title without location or date (usually)
Event Category	Values include era, war, exhibition, expedition (term list?)
Occurrence Type	E.g. one-time, periodic, annual. Optional
Event Start Date	Single date as year or y/m/d. Mixed formats (sorry!). Also includes BCE dates expressed as negative integers e.g. -3100 Optional
Event End Date	As for Event Start Date. Optional
Display Date	?
Duration	Integer – use with Duration Unit. Optional
Duration Unit	E.g. days, months, years. Use with Duration. Optional
Event Description	Text. Optional
Description Source(s)	May be a URL. Optional
Sort Name	Internal use version of event name

Science Museum API documentation

I originally posted this on the Science Museum API wiki in 2008, this version dates from about March 2011 (when I left the Science Museum Group to start a PhD).

At that point, the APIs available related to various exhibitions, collections etc were: APIs: Collections, Pledges, Countries, Object Wiki, Exhibitions.

Science Museum API documentation

These documents describe the functionality of the Science Museum APIs.

The APIs have been released as a trial. As such, they should be considered 'beta', and things may change without warning.

If you are interested in devloping using these APIs, or want to ask any questions or make any suggestsions about them, please email us at web.team@nmsi.ac.uk or leave a comment at http://sciencemuseumdiscovery.com/blogs/museumdev.

In addition to the APIs documented here, we have an XML-based API with objects from the exhibition Cosmos & Culture at http://www.sciencemuseum.org.uk/objectapi/cosmosculturepublic.svc/MuseumObjects.

‘Things’ and our collections data

I originally posted this on the Science Museum developers blog.

Frankie Roberto has made a web app based on the object records from the collections of the Science Museum, the National Media Museum and the National Railway Museum released yesterday. In his words:

I thought I’d have a quick play with the data last night, and so managed to import them into a database and built a quick web app called ‘Things’:
http://what-is-this.heroku.com/

The main thing I wanted out of the data was to be able to browse by type-of-thing (eg ‘steam engines’). Given that this information isn’t easily accessible from the existing data, the first thing that ‘Things’ does is ask people to help classify the objects.

It’s sort of like tagging. But easier. :-)

If I get enough things classified I may have a go at seeing if an algorithm can learn from the data and classify the rest.

Let me know what you think.

Source code is here: https://github.com/frankieroberto/things – patches welcome!

Given the number of crowdsourcing projects around*, the next step for the museum may be working out how to manage and make the most of user-created data we get back from projects like this. This would be an excellent problem to have.

* I’ve also got lots of data to handover based on tags and facts added by people playing with the astronomy collections on Museum Metadata Games, which was again only possible because the Powerhouse Museum has an API and the Science Museum made an earlier, XML-based API.

Update on collections data and geocoded NRM data

I originally posted this on the Science Museum developers blog, Filed under: collections,data,requestforcomment — mia @ 6:05 pm

I’m glad to see the news about the release of objects from the collections of the Science Museum, the National Media Museum and the National Railway Museum has spread so far and wide already.

A few people have commented on the licence (Creative Commons Attribution-NonCommercial-ShareAlike, CC BY-NC-SA) and on the format (CSV). As tomorrow is my last day, I can’t really speak for the museum but the intention is to learn from how people use the data – the things they make, the barriers they face, etc – and iterate (as resources allow) until we get to an optimal solution (or solutions). So please get in touch if you’ve got requests or think you can help clear up some of the issues these kinds of projects face, because there’s a good chance you’ll help make a difference.

The licence is a pragmatic solution – it’s clarification of existing terms rather than a change to our terms, because this avoided a need for legal advice, policy review, etc, that would have added several months to the process.

And yes, I know CSV is quick and dirty, but it’s effective. The museum sector is still working out how to match the resources available with the needs of mash-up type developers who work best with JSON and those who are aiming for linked open data; my hope is that your feedback on this will help museums figure out how to support people using open data in various forms. A simple solution like this also means it’s easy for the museum to re-run the export to update the data as time goes on, and that anyone, geek or not, can open the files without being startled by angle brackets and acronyms. Also, did I mention it was quick?

Finally, we’ve already had some useful feedback and even some improved files. Richard Light sent us a geocoded version of records from the National Railway Museum (NRM) (index of locations: http://api.sciencemuseum.org.uk/collections/updates_from_other_people/Richard_Light/nrm-geo-sort.xml (63kb), full file http://api.sciencemuseum.org.uk/collections/updates_from_other_people/Richard_Light/nrm-geo.xml – 20mb, browser-beware).

I’ll let Richard explain in his own words:

I converted the source CSV to XML using my CSV Converter program, which is a home-made program I wrote to do a “mail-merge” on CSV data, with the aim of easily generating other formats such as XML.

The geocoding was carried out by calls to my place URL-ifier program. This uses the standard Geonames query API, but splits a place description into its component place names (e.g. “Swindon, Wiltshire, England” becomes three place names) and searches for a “Swindon” contained within places “Wiltshire” and “England”.

I wrote an XSLT transform which copied the source document, and each time it found a place field, it called out to my URL-ifier using the document() function:

<xsl:template match=”PLACE_MADE[text()!="]“>
<xsl:variable name=”geonames”
select=”document(concat(‘http://light.demon.co.uk/scripts/getPlaceURL.exe
?amp;q=’, text()))/*/text()”/>
<xsl:copy>
<xsl:if test=”$geonames!=””>
<xsl:attribute name=”geonamesId”><xsl:value-of
select=”$geonames”/></xsl:attribute>
</xsl:if>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

Where this was successful in inferring a Geonames identifier, it added a “geonamesId” attribute to the PLACE_MADE field. So the result is a copy of the source data, with added geocoding.

All of the NRM data was geocoded in a single XSLT operation, but this operation had to call my URL-ifier, and hence the Geonames API, many times. There are limits on how hard you can hit this service, so care needs to be exercised! (You can get your own Geonames identifier for free, and then have your own allocation of API calls, if you want to use this service in a serious way.)

Now that the data contains Geonames URLs, you have access to all the background information about each place. All Geonames entries have lat/long co-ordinates (which is what you need to stick a pin on a map in your browser, using e.g. KML markup), but in addition will often have info such as population. You just need to make an HTTP request for the Geonames URL, specifying that you want RDF back, e.g.: http://light.demon.co.uk/scripts/cgiforwarder.exe?url=http://sws.geonames.org/2633352/&accept=rdf and process the RDF/XML which comes back.

Personally, this kind of thing makes it all worthwhile – we can’t easy export our entire geographical hierarchy, so being able to geocode the imperfect data we have is really useful.

If you’ve done something interesting with our data we’d love to feature it. We’re also curious to know who’s having a look at it, even if you’re not at the point of having something to share.

Finally, I’d almost forgotten to thank the many wonderful people who’d contributed to the Museums and the machine-processable web site or come along to #linkingmuseums meetups to work out how to get to re-usable museum data. I’ll be keeping up the wiki in future, and can be contacted @mia_out.

Collections data published

I originally posted this on the Science Museum developers blog.

I’m very excited about sharing this with you – we’ve just released 218,822 records about objects from the collections of the Science Museum, the National Media Museum and the National Railway Museum.

We’ve released the files as a lightweight experiment – we’d like to understand whether, and if so, how, people would use our data. We’d also like to explore the benefits for the museum and for programmers using our data – your feedback will inform decisions about future investment in more structured data as well as helping shape our understanding of the requirements of those users. The files are in CSV format – because it’s a really simple format, viewable in a text editor, we hope that it will be usable by most people.

We’ve published three data sets:

218,822 object records
40,596 media records
173 event records

The files are released under the Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) licence. Please get in touch if you’ve got ideas that require a commercial licence.

The files are available at
Documentation for collections data from Science Museum, National Media Museum, National Railway Museum (NMSI) released as CSV. This page includes information about the fields available and the collections included.

The documentation page includes contact addresses, or you can leave a comment below.

Notes from Culture Hack Day (#chd11)

Culture Hack Day (#chd11) was organised by the Royal Opera House (the team being @rachelcoldicutt, @katybeale, @beyongolia, @mildlydiverting, @dracos – and congratulations to them all on an excellent event). As well as a hack event running over two days, they had a session of five minute 'lightning talks' on Saturday, with generous time for discussion between sessions. This worked quite well for providing an entry point to the event for the non-technical, and some interesting discussion resulted from it. My notes are particularly rough this time as I have one arm in a sling and typing my hand-written notes is slow.

Lightning Talks
Tom Uglow @tomux “What if the Web is a Fad?”
'We're good at managing data but not yet good at turning it into things that are more than points of data.' The future is about physical world, making things real and touchable.

Clare Reddington, @clarered, “What if We Forget about Screens and Make Real Things?”
Some ace examples of real things: Dream Director; Nuage Vert (Helsinki power station projected power consumption of city onto smoke from station – changed people's behaviour through ambient augmentation of the city); Tweeture (a conch, 'permission object' designed to get people looking up from their screens, start conversations); National Vending Machine from Dutch museum.

Leila Johnston, @finalbullet talked about why the world is already fun, and looking at the world with fresh eyes. Chromaroma made Oyster cards into toys, playing with our digital footprint.

Discussion kicked off by Simon Jenkins about helping people get it (benefits of open data etc) – CR – it's about organisational change, fears about transparency, directors don't come to events like this. Understand what's meant by value – cultural and social as well as economic. Don't forget audiences, it has to be meaningful for the people we're making it (cultural products) for'.

Comment from @fidotheCultural heritage orgs have been screwed over by software companies. There's a disconnect between beautiful hacks around the edges and things that make people's lives easier. [Yes! People who work in cultural heritage orgs often have to deal with clunky tools, difficult or vendor-dependent data export proccesses, agencies that over-promise and under-deliver. In my experience, cultural orgs don't usually have internal skills for scoping and procuring software or selecting agencies so of course they get screwed over.]

TU: desire to be tangible is becoming more prevalent, data to enhance human experience, the relationship between culture and the way we live our lives.

CR: don't spend the rest of the afternoon reinforcing silos, shouldn't be a dichotomy between cultural heritage people and technologists. [Quick plug for http://museum30.ning.com/, http://groups.google.com/group/antiquist, http://museum-api.pbwiki.com/ and http://museumscomputergroup.org.uk/email-list/ as places where people interested in intersection between cultural heritage and technology can mingle – please let me know of any others!] Mutual respect is required.

Tom Armitage, @infovore “Sod big data and mashups: why not hack on making art?”
Making culture is more important than using it. 3 trends: 1) collection – tools to slice and dice across time or themes; 2) magic materials 3) mechanical art, displays the shape of the original content; 3a) satire – @kanyejordan 'a joke so good a machine could make it'.

Tom Dunbar, @willyouhelp – story-telling possibilites of metadata embedded in media e.g. video [check out Waisda? for game designed to get metdata added to audio-visual archives]. Metadata could be actors, characters, props, action…

Discussion [?]:remixing in itself isn't always interesting. Skillful appropriation across formats… Universe of editors, filterers, not only creators. 'in editing you end up making new things'.

Matthew Somerville, @dracos, Theatricalia, “What if You Never Needed to Miss a Show?”
'Quite selfish', makes things he needs. Wants not to miss theatre productions with people he likes in/working on them. Theatricalia also collects stories about productions. [But in discussion it came up that the National Theatre asked him to remove data – why?! A recommendation system would definitely get me seeing more theatre, and I say that as a fairly regular but uninformed theatre-goer who relies on word-of-mouth to decide where to spend ticket money.]

Nick Harkaway, @Harkaway on IP and privacy
IP as way of ringfencing intangible ideas, requiing consent to use. Privacy is the same. Not exciting, kind of annoying but need to find ways to make it work more smoothly while still proving protection. 'Buying is voting', if you buy from Tesco, you are endorsing their policies. 'Code for the change you want to see in the world', build the tools you want cultural orgs to have so they can do better. [Update: Nick has posted his own notes at Notes from Culture Hack Day. I really liked the way he brought ethical considerations to hack enthusiasm for pushing the boundaries of what's possible – the ability to say 'no' is important even if a pain for others.]

Chris Thorpe, @jaggeree. ArtFinder, “What if you could see through the walls of every museum and something could tell you if you’d like it?”

Culture for people who don't know much about culture. Cultural buildings obscure the content inside, stop people being surprised by what's available. It's hard if you don't know where to start. Go for user-centric information. Government Art Collection Explorer – ace! Wants an angel for art galleries to whisper information about the art in his ear. Wants people to look at the art, not the screen of their device [museums also have this concern]. SAP – situated audio platform. Wants a 'flight data recorder' for trips around cultural places.

Discussion around causes of fear and resistance to open data – what do cultural orgs fear and how can they learn more and relax? Fear of loss of provenance – response was that for developers displaying provenance alongside the data gives it credibility; counter-response was that organisations don't realise that's possible. [My view is that the easiest way to get this to change is to change the metrics by which cultural heritage organisations are judged, and resolve the tension between demands to commercialise content to supplement government grants and demands for open access to that same data. Many museums have developed hybrid 'free tombstone, low-res, paid-for high-res' models to deal with this, but it's taken years of negotiation in each institution.] I also ranted about some of these issues at OpenTech 2010, notes at 'Museums meet the 21st century'.

Other discussion and notes from twitter – re soap/drama characters tweeting – I managed to out myself as a Neighbours watcher but it was worth it to share that Neighbours characters tweet and use Facebook. Facebook relationship status updates and events have been included as plot points, and references are made to twitter but not to the accounts of the characters active on the service. I wonder if it's script writers or marketing people who write the characters tweets? They also tweet in sync with the Australian showings, which raises issues around spoilers and international viewers.

Someone said 'people don't want to interact with cultural institutions online. They want to interact with their content' but I think that's really dependent on the definition of content – as pointed out, points of data have limited utility without further context. There's a catch-22 between cultural orgs not yet making really engaging data and audiences not yet demanding it, hopefully hack days like CHD11 help bridge the gap and turn data into stories and other meaningful content. We're coming up against the limits of what can be dome programmatically, especially given variation in quality and extent of cultural heritage data (and most of it is data rather than content).

[Update: after writing this I found a post The lightning talks at Culture Hack Day about the day, which happily picks up on lots of bits I missed. Oh, and another, by Roo Reynolds.]

After the lightning talks I popped over the road to check out the hacking and ended up getting sucked in (the lure of free pizza had a powerful effect!). I worked on a WordPress plugin with Ian Ibbotson @ianibbo that lets you search for a term on the Culture Grid repository and imports the resulting objects into my museum metadata games so that you can play with objects based on your favourite topic. I've put the code on github [https://github.com/mialondon/mmg-import] and will move it from my staging server to live over the next few days so people can play with the objects. It's such a pain only having one hand, and I'm very grateful to Ian for the chance to work together and actually get some code written. This work means that any organisation that's contributed records to the Culture Grid can start to get back tags or facts to enhance their collections, based on data generated by people playing the games. The current 300-ish objects have about 4400 tags and 30 facts, so that's not bad for a freebie. OTOH, I don't know of many museums with the ability to display content created by others on their collections pages or store it in their collections management systems – something for another hack day?

Something I think I'll play around with a bit more is the idea of giving cultural heritage data a quality rating as it's ingested. We discussed whether the ratings would be local to an app (as they could be based on the particular requirements of that application) or generalised and recorded in the CultureGrid service. You could record the provence of a rating which might be an approach that combines the benefits of both approaches. At the moment, my requirements for a 'high quality' record would be: title (e.g. 'The Ashes trophy', if the object has one), name or type of object (e.g. cup), date, place, decent sized image, description.

Finally, if you're interested in hacking around cultural heritage data, there's also historyhackday next weekend. I'm hoping to pop in (dependent on fracture and MSc dissertation), not least because in March I'm starting a PhD in digital humanities, looking at participatory digitisation of geo-located historical material (i.e. getting people to share the transcriptions and other snippets of ad hoc digitisation they do as part of their research) and it's all hugely relevant.

Interview about museum metadata games and a pretty picture

I haven't had a chance to follow up Design constraints and research questions: museum metadata games with a post about the design process for the museum metadata games I've made for my dissertation project (because, stupidly, I slipped on black ice and damaged my wrist), so in the meantime here's a link to an interview Seb Chan did with me for the Fresh+New blog, Interview with Mia Ridge on museum metadata games, and a Wordle of the tags added so far.

There have been nearly 700 turns on the games so far, which have collectively added about 30 facts (Donald’s detective puzzle) and just over 3,700 tags (Dora’s lost data).