"The coolest thing to be done with your data will be thought of by someone else"

I discovered this ace quote, "the coolest thing to be done with your data will be thought of by someone else", on JISC's Common Repository Interfaces Group (CRIG) site, via the The Repository Challenge. The CRIG was created to "help identify problem spaces in the repository landscape and suggest innovative solutions. The CRIG consists of a core group of technical, policy and development staff with repository interface expertise. It encourages anyone to join who is dedicated and passionate about surfacing scholarly content on the web."

Read 'repository or federated search' for 'repository' (or think of a federated search as a pseudo-repository) and 'scholarly' for 'cultural heritage' content, and it sounds like an awful lot of fun.

It's also the sentiment behind the UK Government's Show Us a Better Way, the Mashed Museum days and a whole bunch of similar projects.

What would you create with public (UK) information?

Show Us a Better Way want to know, and if your idea is good they might give you £20,000 to develop it to the next level.

Do you think that better use of public information could improve health, education, justice or society at large?

The UK Government wants to hear your ideas for new products that could improve the way public information is communicated.

Importantly, you don't need to be a geek:

You don't have to have any technical knowledge, nor any money, just a good idea, and 5 minutes spare to enter the competition.

And they've made "gigabytes of new or previously invisible public information" available for the project, including health, crime and education data (but no personal information).

Lonely Planet launch API

Lonely Planet launched their 'Explore API' and developer network at the BBC Mashed 2008 day. Available content includes 'destination content, including geocoded points of interest reviews, destination profiles, traveller-created "best of" lists and travel photographs' from their image library so as a travel junkie I'm already itching to have a play.

There's more background in this interview with Chris Heilmann and Chris Boden on the Yahoo! Developer blog, 'Lonely Planet starts developer program at mashed08 in London', but I thought it was worth pulling out this quote about the benefit of APIs, particularly as they're an organisation whose business model relies on its reputation and content:

Where do you see the benefit in releasing an API? How do you plan to monetize it or is it a loss-leader for you?

We don't have a funky web app like Twitter or Dopplr at this stage but we do have content – in a sense, that content is our platform. We want to take the Lonely Planet content and community experience onto relevant new platforms and make it accessible to travellers in new ways. We're not going to be able to do all of that on our own so we're looking to tap into external sources of innovation and creativity through open collaboration to help us imagine and execute the next generation of services that might enrich the lives of our community.

In terms of monetization, we'll look to work commercially with those developers who come up with innovations that we believe have the potential to create commercial value.

Quick and light solutions at 'UK Museums on the Web Conference 2008'

These are my notes from session 4, 'Quick and light solutions', of the UK Museums on the Web Conference 2008. In the interests of getting my notes up quickly I'm putting them up pretty much 'as is', so they're still rough around the edges. There are quite a few sections below which need to be updated when the presentations or photos of slides go online. [These notes would have been up a lot sooner if my laptop hadn't finally given up the ghost over the weekend.]

Frankie Roberto, 'The guerrilla approach to aggregating online collections'
He doesn't have slides, he's presenting using Firefox 3. [You can also read Frankie's post about his presentation on his blog.]

His projects came out of last year's mashed museum day, where the lack of re-usable cultural heritage data online was a real issue. Talk in the pub turned to 'the dark side' of obtaining data – screen scraping was one idea. Then the idea of FoI requests came up, and Frankie ended up sending Freedom of Information requests to national museums in any electronic format with some kind of structure.

He's not showing site he presented at Montreal, it should be online soon and he'll release the code.

Frankie demonstrated the Science Museum object wiki.

[I found 'how it works' as focus of the object text on the Science Museum wiki a really interesting way of writing object descriptions, it could work well for other projects.]

He has concerns about big top down projects so he's suggesting five small or niche projects. He asked himself, how do people relate to objects?
1. Lots of people say, "I've got one of these" so: ivegotoneofthose.com – put objects up, people can hit button to say 'I have one of those'. The raw numbers could be interesting.
[I suggested this for Exploring 20th Century London at one point, but with a bit more user-generated content so that people could upload photos of their object at home or stories about how they got it, etc. I suppose ivegotoneofthose.com could be built so that it also lets people add content about their particular thing, then ideally that could be pulled back into and displayed on a museum site like Exploring. Would ivegotoneofthose.com sit on top of a federated collections search or would it have its own object list?]
2. Looking at TheyWorkForYou.com, he suggests: TheyCollectForYou.com – scan acquisition forms, publish feeds of which curators have bought what objects. [Bringing transparency to the acquisition process?]
3. Looking at howstuffworks.com, what about howstuffworked.com?
4. 'what should we collect next?' – opening up discourse on purchasing. Frankie took the quote from Indiana Jones: thatbelongsinamuseum.com – people can nominate things that should be in a museum.
5. pricelessartefact.com – [crowdsourcing object evaluation?] – comparing objects to see which is the most valuable, however 'valuable' is defined.
[Except that possibly opens the museum to further risk of having stuff nicked to order]

Fiona Romeo, 'Different ways of seeing online collections'
I didn't take many detailed notes for this paper, but you can see my notes on a previous presentation at Notes from 'Maritime Memorials, visualised' at MCG's Spring Conference.

Mapping – objects don't make a lot of sense about themselves, but are compelling as part of information about an expedition, or failed expedition.

They'll have new map and timeline content launching next month.

Stamen can share information about how they did their geocoding and stuff.

Giving your data out for creative re-use can be as easy as giving out a CSV file.
You always want to have an API or feed when doing any website.
The National Maritime Museum make any data set they can find without licensing restrictions and put it online for creative re-use.

[Slide on approaches to data enhancement.]
Curation is the best approach but it's time-consuming.

Fiona spoke about her experiments at the mashed museum day – she cut and paste transcript data into IBM's Many Eyes. It shows that really good tools are available, even if you don't have resources to work with a company like Stamen.

Mike Ellis presented a summary of the 'mashed museum' day held the day before.

Questions, wrap up session
Jon – always assume there (should be) an API

[A question I didn't ask but posted on twitter: who do we need to get in the room to make sure all these ideas for new approaches to data, to aggregation and federation, new types of experiences of cultural heritage data, etc, actually go somewhere?]

Paul on fears about putting content online: 'since the state of Florida put pictures of their beaches on their website, no-one goes to the beach anymore'.

Metrics:
Mike: need to go shout at DCMS about the metrics, need to use more meaningful metrics especially as thinking of something like APIs
Jon: watermark metadata… micro-marketing data.
Fiona: send it out with a wrapper. Make it embeddable.

Question from someone from Guernsey Museum about images online: once you've downloaded your nice image its without metadata. George: Flickr like as much data in EXIF as possible. EXIF data isn't permanent but is useful.

Angela Murphy: wrappers are important for curators, as they're more willing to let things go if people can get back to the original source.

Me, referring back to the first session of the day: what were Lee Iverson's issues with the keynote speech? Lee: partly about the role of institution like the BBC in modern space. National broadcaster should set social common ground, be a fundamental part of democratic discussion. It's even more important now because of variety of sources out there, people shutting off or being selective about information sources to cope with information overload. Disparate source mean no middle ground or possibility of discussion. BBC should 'let it go' – send the data out. The metric becomes how widely does it spread, where does it show up? If restricted to non-commercial use then [strangling use/innovation].

The 'net recomender' thing is a flawed metric – you don't recommend something you disagree with, something that is new or difficult knowledge. What gets recommended is a video of a cute 8 year old playing Guitar Hero really well. People avoid things that challenge them.

Fiona – the advantage of the 'net recomender' is it's taking judgement of quality outside originating institution.

Paul asked who wondered why 7 – 8 on scale of 10 is neutral for British people, would have thought it's 5 – 6.

Angela: we should push data to DCMS instead of expecting them to know what they could ask for.

George: it's opportunity to change the way success is measured. Anita Roddick says 'when the community gives you wealth, it's time to give it back'. [Show, don't tell] – what would happen if you were to send a video of people engaging instead of just sending a spreadsheet?

Final round comments
Fiona: personal measure of success – creating culture of innovation, engagement, creating vibrant environment.

Paul: success is getting other people to agree with what we've been talking about [at the mashed museum day and conference] the past two days. [yes yes yes!] A measure of success was how a CEO reacted to discovering videos about their institution on YouTube – he didn't try to shut it down, but asked, 'how we can engage with that'

Ross on 'take home' ideas for the conference
Collections – we conflate many definitions in our discussions – images, records, web pages about collections.

Our tone has changed. Delivery changed – realignment of axis of powers, MLA's Digital portfolio is disappearing, there's a vacuum. Who will fill it? The Collections Trust, National Museum Directors' Conference? Technology's not a problem, it's the cultural, human factors. We need to talk about where the tensions are, we've been papering over the cracks. Institutional relationships.

The language has changed – it was about digitisation, accessibility, funding. Three words today – beauty, poetry, life. We're entering an exciting moment.

What's the role of the Museums Computer Group – how and what can the MCG do?

Let's help our visitors get lost

In 'Community: From Little Things, Big Things Grow' on ALA, George Oates from Flickr says:

It's easy to get lost on Flickr. You click from here to there, this to that, then suddenly you look up and notice you've lost hours. Allow visitors to cut their own path through the place and they'll curate their own experiences. The idea that every Flickr visitor has an entirely different view of its content is both unsettling, because you can't control it, and liberating, because you’ve given control away. Embrace the idea that the site map might look more like a spider web than a hierarchy. There are natural links in content created by many, many different people. Everyone who uses a site like Flickr has an entirely different picture of it, so the question becomes, what can you do to suggest the next step in the display you design?

I've been thinking about something like this for a while, though the example I've used is Wikipedia. I have friends who've had to ban themselves from Wikipedia because they literally lose hours there after starting with one innocent question, then clicking onto an interesting link, then onto another…

That ability to lose yourself as you click from one interesting thing to another is exactly what I want for our museum sites: our visitor experience should be as seductive and serendipitous as browsing Wikipedia or Flickr.

And hey, if we look at the links visitors are making between our content, we might even learn something new about our content ourselves.

MultiMimsy database extractions and the possibilities for OAI-based collections repositories

I've uploaded my presentation slides from a talk for the UK MultiMimsy Users group in Docklands last month to MultiMimsy database extractions and the possibilities for OAI-based collections repositories at the Museum of London.

The first part discusses how to get from a set of data in a collections management system to a final published website, looking at the design process and technical considerations. Willoughby's use of Oracle on the back-end means that any ODBC-compliant database can query the underlying database and extract collections data.

The paper then looks at some of the possibilities for the Museum of London's OAI-PMH repository. We've implemented an OAI repository for the People's Network Discover Service (PNDS) for Exploring 20th Century London (which also means we're set to get records into Europeana), but I hope that we can use the repository in lots of other ways, including the possibility of using our repository to serve data for federated searches.

There's currently some discussion internationally in the cultural heritage sector about repositories vs federated search, but I'm not sure it's an either/or choice. The reasons each are used are often to do with political or funding factors instead of the base technology, but either method, or both, could be used internally or externally depending on the requirements of the project and institution.

I can go into more detail about the scripts we use to extract data from MultiMimsy or send sample scripts if people are interested. They might be a good way to get started if you haven't extracted data from MultiMimsy before but they won't generally be directly relevant to your data structres as the use of MultiMimsy can vary so widely between types of museums, collections and projects.

Calling geeks in the UK with an interest in cultural heritage content/audiences

You might be interested in BathCamp – a bar camp in Bath on a Saturday (with overnight stay) in late August. This is an initial open call so head along to the website (BathCamp) and check it out. Ideally you would have an interest in cultural heritage content, audiences or applications, but we love the idea of getting fresh perspectives from a wide range of people so we don't expect that you would have worked with the cultural heritage sector (museums, galleries, libraries, archives, archaeology) before.

How I do documentation: a column of bumph and a column of gold

All programmers hate documentation, right? But I've discovered a way to make it less painful and I'm posting in case it helps anyone else.

The first trick is to start documenting as soon as you start thinking about a project – well before you've written any code. I keep a running document of the work I've done, including the bits I'm about to try, information about links into other databases or applications, issues I need to think about or questions I need to ask someone, rude comments (I know, I look like such a nice girl), references, quick use cases, bits about functions, summary notes from meetings, etc.

Mostly I record by date, blog style. Doing it by date helps me link repository files, paper notes and emails with particular bits of work, which can otherwise be tricky if it's a while since you worked on a project or if you have lots of projects on the go. It's also handy if you need to record the time spent on different projects.

I just did it like this for a while, and it was ok, but I learnt the hard way that it takes a while to sort through it if I needed to send someone else some documentation. Then I made a conscious decision to separate the random musings from the decisions and notes on the productive bits of code.

So now my document has two columns. This first column is all the bumph described above – the stuff I'd need if I wanted to retrace my steps or remind myself why I ended up doing things a certain way. The second column records key decisions or final solutions. This is your column of gold.

This way I can quickly run down the items in the second column, organise it by area instead of by date and come up with some good documentation without much effort. And if I ever want to write up the whole project, I've got a record of the whole process in the column of bumph.

You could add a third column to record outstanding tasks or questions. I tend to mark these up with colour and un-colour them when they're done. It just depends how you like to work.

It's amazingly simple, but it works. I hope it might be useful for you too. Or if you have any better suggestions (or a better title for this post), I'd love to hear them.

Move your FAQ to Wikipedia?

Mal Booth from the Australian War Memorial (AWM) makes the fascinating suggestion: they should move their entire Encyclopaedia to Wikipedia. Their encyclopaedia seems to function as a fully researched and referenced FAQ with content creation driven by public enquiries, and would probably sit well in Wikipedia.

In Wikipedia and "produsers", Mal says:

"Putting the content up on Wikipedia.org gives it MUCH wider exposure than our website ever can and it therefore has the potential to bring new users to our website that may not even know we exist (via links in to our own web content). With a wikipedia.org user account, we can maintain an appropriate amount of control over the content (more than we have at present over wikipedia content that started as ours, already put up there by others).

Another point is that putting it up on Wikipedia allows us to engage the assistance of various volunteers who'd like to help us, but don't live locally."

He also presents some good suggestions from their web developer, Adam: they should understand and participate in the Wikipedia community, and identify themselves as AWM professionals before importing content. I think they've taken the first step by assessing the suitability of their content for Wikipedia.

It's also an interesting example of an organisation that is willing to 'let go' of their content and allow it to be used and edited outside their institution. Mal's blog is a real find (and I'm not just saying that because it has 'Melbin' (Melbourne) in the title), and I'll be following the progress of their project with interest.

I wonder how issues of trust and authority will play out on their entries: by linking to the relevant Wikipedia entries, the AWM is giving those entries a level of authority they might not otherwise have. They're also placing a great deal of trust in Wikipedia authors.

Mal links to a post by Alex Bruns, Beyond Public Service Broadcasting: Produsage at the ABC and summarises the four preconditions for good user-generated content:

  • the replacement of a hierarchy with a more open participatory structure;
  • recognising the power of the COMMUNITY to distinguish between constructive and destructive contributions;
  • allowing for random (granular, simple) acts of participation (like ratings); and
  • the development of shared rather than owned content that is able to be re-used, re-mixed or mashed up.

Adam's post lists key principles that anyone "looking to develop successful and sustainable participatory media environments" should take into account. These points are defined and expanded on in the original post, which is well worth reading:

  1. Open Participation, Communal Evaluation
  2. Fluid Heterarchy, Ad Hoc Meritocracy
  3. Unfinished Artefacts, Continuing Process
  4. Common Property, Individual Rewards