experimental – Page 2

"The coolest thing to be done with your data will be thought of by someone else"

I discovered this ace quote, "the coolest thing to be done with your data will be thought of by someone else", on JISC's Common Repository Interfaces Group (CRIG) site, via the The Repository Challenge. The CRIG was created to "help identify problem spaces in the repository landscape and suggest innovative solutions. The CRIG consists of a core group of technical, policy and development staff with repository interface expertise. It encourages anyone to join who is dedicated and passionate about surfacing scholarly content on the web."

Read 'repository or federated search' for 'repository' (or think of a federated search as a pseudo-repository) and 'scholarly' for 'cultural heritage' content, and it sounds like an awful lot of fun.

It's also the sentiment behind the UK Government's Show Us a Better Way, the Mashed Museum days and a whole bunch of similar projects.

What would you create with public (UK) information?

Show Us a Better Way want to know, and if your idea is good they might give you £20,000 to develop it to the next level.

Do you think that better use of public information could improve health, education, justice or society at large?

The UK Government wants to hear your ideas for new products that could improve the way public information is communicated.

Importantly, you don't need to be a geek:

You don't have to have any technical knowledge, nor any money, just a good idea, and 5 minutes spare to enter the competition.

And they've made "gigabytes of new or previously invisible public information" available for the project, including health, crime and education data (but no personal information).

Lonely Planet launch API

Lonely Planet launched their 'Explore API' and developer network at the BBC Mashed 2008 day. Available content includes 'destination content, including geocoded points of interest reviews, destination profiles, traveller-created "best of" lists and travel photographs' from their image library so as a travel junkie I'm already itching to have a play.

There's more background in this interview with Chris Heilmann and Chris Boden on the Yahoo! Developer blog, 'Lonely Planet starts developer program at mashed08 in London', but I thought it was worth pulling out this quote about the benefit of APIs, particularly as they're an organisation whose business model relies on its reputation and content:

Where do you see the benefit in releasing an API? How do you plan to monetize it or is it a loss-leader for you?

We don't have a funky web app like Twitter or Dopplr at this stage but we do have content – in a sense, that content is our platform. We want to take the Lonely Planet content and community experience onto relevant new platforms and make it accessible to travellers in new ways. We're not going to be able to do all of that on our own so we're looking to tap into external sources of innovation and creativity through open collaboration to help us imagine and execute the next generation of services that might enrich the lives of our community.

In terms of monetization, we'll look to work commercially with those developers who come up with innovations that we believe have the potential to create commercial value.

Quick and light solutions at 'UK Museums on the Web Conference 2008'

These are my notes from session 4, 'Quick and light solutions', of the UK Museums on the Web Conference 2008. In the interests of getting my notes up quickly I'm putting them up pretty much 'as is', so they're still rough around the edges. There are quite a few sections below which need to be updated when the presentations or photos of slides go online. [These notes would have been up a lot sooner if my laptop hadn't finally given up the ghost over the weekend.]

Frankie Roberto, 'The guerrilla approach to aggregating online collections'
He doesn't have slides, he's presenting using Firefox 3. [You can also read Frankie's post about his presentation on his blog.]

His projects came out of last year's mashed museum day, where the lack of re-usable cultural heritage data online was a real issue. Talk in the pub turned to 'the dark side' of obtaining data – screen scraping was one idea. Then the idea of FoI requests came up, and Frankie ended up sending Freedom of Information requests to national museums in any electronic format with some kind of structure.

He's not showing site he presented at Montreal, it should be online soon and he'll release the code.

Frankie demonstrated the Science Museum object wiki.

[I found 'how it works' as focus of the object text on the Science Museum wiki a really interesting way of writing object descriptions, it could work well for other projects.]

He has concerns about big top down projects so he's suggesting five small or niche projects. He asked himself, how do people relate to objects?
1. Lots of people say, "I've got one of these" so: ivegotoneofthose.com – put objects up, people can hit button to say 'I have one of those'. The raw numbers could be interesting.
[I suggested this for Exploring 20th Century London at one point, but with a bit more user-generated content so that people could upload photos of their object at home or stories about how they got it, etc. I suppose ivegotoneofthose.com could be built so that it also lets people add content about their particular thing, then ideally that could be pulled back into and displayed on a museum site like Exploring. Would ivegotoneofthose.com sit on top of a federated collections search or would it have its own object list?]
2. Looking at TheyWorkForYou.com, he suggests: TheyCollectForYou.com – scan acquisition forms, publish feeds of which curators have bought what objects. [Bringing transparency to the acquisition process?]
3. Looking at howstuffworks.com, what about howstuffworked.com?
4. 'what should we collect next?' – opening up discourse on purchasing. Frankie took the quote from Indiana Jones: thatbelongsinamuseum.com – people can nominate things that should be in a museum.
5. pricelessartefact.com – [crowdsourcing object evaluation?] – comparing objects to see which is the most valuable, however 'valuable' is defined.
[Except that possibly opens the museum to further risk of having stuff nicked to order]

Fiona Romeo, 'Different ways of seeing online collections'
I didn't take many detailed notes for this paper, but you can see my notes on a previous presentation at Notes from 'Maritime Memorials, visualised' at MCG's Spring Conference.

Mapping – objects don't make a lot of sense about themselves, but are compelling as part of information about an expedition, or failed expedition.

They'll have new map and timeline content launching next month.

Stamen can share information about how they did their geocoding and stuff.

Giving your data out for creative re-use can be as easy as giving out a CSV file.
You always want to have an API or feed when doing any website.
The National Maritime Museum make any data set they can find without licensing restrictions and put it online for creative re-use.

[Slide on approaches to data enhancement.]
Curation is the best approach but it's time-consuming.

Fiona spoke about her experiments at the mashed museum day – she cut and paste transcript data into IBM's Many Eyes. It shows that really good tools are available, even if you don't have resources to work with a company like Stamen.

Mike Ellis presented a summary of the 'mashed museum' day held the day before.

Questions, wrap up session
Jon – always assume there (should be) an API

[A question I didn't ask but posted on twitter: who do we need to get in the room to make sure all these ideas for new approaches to data, to aggregation and federation, new types of experiences of cultural heritage data, etc, actually go somewhere?]

Paul on fears about putting content online: 'since the state of Florida put pictures of their beaches on their website, no-one goes to the beach anymore'.

Metrics:
Mike: need to go shout at DCMS about the metrics, need to use more meaningful metrics especially as thinking of something like APIs
Jon: watermark metadata… micro-marketing data.
Fiona: send it out with a wrapper. Make it embeddable.

Question from someone from Guernsey Museum about images online: once you've downloaded your nice image its without metadata. George: Flickr like as much data in EXIF as possible. EXIF data isn't permanent but is useful.

Angela Murphy: wrappers are important for curators, as they're more willing to let things go if people can get back to the original source.

Me, referring back to the first session of the day: what were Lee Iverson's issues with the keynote speech? Lee: partly about the role of institution like the BBC in modern space. National broadcaster should set social common ground, be a fundamental part of democratic discussion. It's even more important now because of variety of sources out there, people shutting off or being selective about information sources to cope with information overload. Disparate source mean no middle ground or possibility of discussion. BBC should 'let it go' – send the data out. The metric becomes how widely does it spread, where does it show up? If restricted to non-commercial use then [strangling use/innovation].

The 'net recomender' thing is a flawed metric – you don't recommend something you disagree with, something that is new or difficult knowledge. What gets recommended is a video of a cute 8 year old playing Guitar Hero really well. People avoid things that challenge them.

Fiona – the advantage of the 'net recomender' is it's taking judgement of quality outside originating institution.

Paul asked who wondered why 7 – 8 on scale of 10 is neutral for British people, would have thought it's 5 – 6.

Angela: we should push data to DCMS instead of expecting them to know what they could ask for.

George: it's opportunity to change the way success is measured. Anita Roddick says 'when the community gives you wealth, it's time to give it back'. [Show, don't tell] – what would happen if you were to send a video of people engaging instead of just sending a spreadsheet?

Final round comments
Fiona: personal measure of success – creating culture of innovation, engagement, creating vibrant environment.

Paul: success is getting other people to agree with what we've been talking about [at the mashed museum day and conference] the past two days. [yes yes yes!] A measure of success was how a CEO reacted to discovering videos about their institution on YouTube – he didn't try to shut it down, but asked, 'how we can engage with that'

Ross on 'take home' ideas for the conference
Collections – we conflate many definitions in our discussions – images, records, web pages about collections.

Our tone has changed. Delivery changed – realignment of axis of powers, MLA's Digital portfolio is disappearing, there's a vacuum. Who will fill it? The Collections Trust, National Museum Directors' Conference? Technology's not a problem, it's the cultural, human factors. We need to talk about where the tensions are, we've been papering over the cracks. Institutional relationships.

The language has changed – it was about digitisation, accessibility, funding. Three words today – beauty, poetry, life. We're entering an exciting moment.

What's the role of the Museums Computer Group – how and what can the MCG do?

Notes from 'Aggregating Museum Data – Use Issues' at MW2008

These are my notes from the session 'Aggregating Museum Data – Use Issues' at Museums and the Web, Montreal, April 2008.

These notes are pretty rough so apologies for any mistakes; I hope they're a bit useful to people, even though it's so late after the event. I've tried to include most of what was covered but it's taken me a while to catch up on some of my notes and recollection is fading. Any comments or corrections are welcome, and the comments in [square brackets] below are me. All the Museums and the Web conference papers and notes I've blogged have been tagged with 'MW2008'.

This session was introduced by David Bearman, and included two papers:
Exploring museum collections online: the quantitative method by Frankie Roberto and Uniting the shanty towns – data combining across multiple institutions by Seb Chan.

David Bearman: the intentionality of the production of data process is interesting i.e. the data Frankie and Seb used wasn't designed for integration.

Frankie Roberto, Exploring museum collections online: the quantitative method (slides)
He didn't give a crap of the quality of the data, it was all about numbers – get as much as possible to see what he could do with it.

The project wasn't entirely authorised or part of his daily routine. It came in part from debates after the museum mash-up day.

Three problems with mashing museum data: getting it, (getting the right) structure, (dealing with) dodgy data

Traditional solutions:
Getting it – APIs
Structure – metadata standards
Dodgy data – hard work (get curators to fix it)

But it doesn't have to be perfect, it just has to be "good enough". Or "assez bon" (and he hopes that translation is good enough).

Options for getting it – screen scrapers, or Freedom of Information (FOI) requests.

FOI request – simple set of fields in machine-readable format.

Structure – some logic in the mapping into simple format.

Dodgy data – go for 'good enough'.

Presenting objects online: existing model – doesn't give you a sense of the archive, the collection, as it's about the individual pages.

So what was he hoping for?
Who, what, where, when, how. ['Why' is the other traditional journalists questions but too difficult in structured information]

And what did he get?
Who: hoping for collection/curator – no data.
What: hoping for 'this is an x'. Instead got categories (based on museum internal structures).
Where: lots of variation – 1496 unique strings. The specificity of terms varies on geographic and historical dimensions.
When: lots of variation
How: hoping for donation/purchase/loan. Got a long list of varied stuff.

[There were lots of bits about whacking the data together that made people around me (and me, at times) wince. But it took me a while to realise it was a collection-level view, not an individual object view – I guess that's just a reflection of how I think about digital collections – so that doesn't matter as much as if you were reading actual object records. And I'm a bit daft cos the clue ('quantitative') was in the title.

A big part of the museum publication process is making crappy date and location and classification data correct, pretty and human-readable, so the variation Frankie found in data isn't surprising. Catalogues are designed for managing collections, not for publication (though might curators also over-state the case because they'd always rather everything was tidied than published in a possible incorrect or messy state?).

It would have been interesting to hear how the chosen fields related to the intended audience, but it might also have been just a reasonable place to start – somewhere 'good enough' – I'm sure Frankie will correct me if I'm wrong.]

It will be on museum-collections.org. Frankie showed some stuff with Google graph APIs.

Prior art – Pitt Rivers Museum – analysis of collections, 'a picture of Englishness'.

Lessons from politics: theyworkforyou for curators.

Issues: visualisations count all objects equally. e.g. lots of coins vs bigger objects. [Probably just as well no natural history collections then. Damn ants!]

Interactions – present user comments/data back to museums?

Whose role is it anyway, to analyse collections data? And what about private collections?

Sebastian Chan, Uniting the shanty towns – data combining across multiple institutions (slides)
[A paraphrase from the introduction: Seb's team are artists who are also nerds (?)]

Paper is about dealing with the reality of mixing data.

Mess is good, but… mess makes smooshing things together hard. Trying to agree on standards takes a long time, you'll never get anything built.

Combination of methods – scraping + trust-o-meter to mediate 'risk' of taking in data from multiple sources.

Semantic web in practice – dbpedia.

Open Calais – bought out from Clearforest by Reuters. Dynamically generated metadata tags about 'entities' e.g. possible authority records. There are problems with automatically generated data e.g. guesses at people, organisations, whatever might not be right. 'But it's good enough'. Can then build onto it so users can browse by people then link to other sites with more information records about them in other datasets.

[But can museums generally cope with 'good enough'? What does that do to ideas of 'authority'? If it's machine-generated because there's not enough time for a person in the museum to do it, is there enough time for a person in the museum to clean it? OTOH, the Powerhouse model shows you can crowdsource the cleaning of tags so why not entities. And imagine if we could connect Powerhouse objects in Sydney with data about locations or people in London held at the Museum of London – authority versus utility?

Do we need to critically examine and change the environment in which catalogue data is viewed so that the reputation of our curators/finds specialists in some of the more critical (bitchy) or competitive fields isn't affected by this kind of exposure? I know it's a problem in archaeology too.]

They've published an OpenSearch feed as GeoRSS.

Fire eagle, Yahoo beta product. Link it to other data sets so you can see what's near you. [If you can get on the beta.]

I think that was the end, and the next bits were questions and discussion.

David Bearman: regarding linked authority files… if we wait until everything is perfect before getting it out there, then "all curators have to die before we can put anything on the web", "just bloody experiment".

Nate (Walker): is 'good enough' good enough? What about involving museums in creating better and correcting data. [I think, correct me if not]
Seb: no reason why a museum community shouldn't create an OpenCalais equivalent. David: Calais knows what reuters know about data. [So we should get together as a sector, nationally or internationally, or as art, science, history museums, and teach it about museum data.]

David – almost saying 'make the uncertainty an opportunity' in museum data – open it up to the public as you may find the answers. Crowdsource the data quality processes in cataloguing! "we find out more by admitting we know less".

Seb – geo-location is critical to allowing communities to engage with this material.

Frankie – doing a big database dump every few months could be enough of an API.

Location sensitive devices are going to be huge.

Seb – we think of search in a very particular way, but we don't know how people want to search i.e. what they want to search for, how they find stuff. [This is one of the sessions that made me think about faceted browsing.]

"Selling a virtual museum to a director is easier than saying 'put all our stuff there and let people take it'".

Tim Hart (Museum Victoria) – is the data from the public going back into the collection management system? Seb – yep. There's no field in EMu for some of the stuff that OpenCalais has, but the use of it from OpenCalais makes a really good business case for putting it into EMu.

Seb – we need tools to create metadata for us, we don't and won't have resources to do it with humans.

Seb – Commons on Flickr is good experiment in giving stuff away. Freebase – not sure if go to that level.

Overall, this was a great session – lots of ideas for small and large things museums can do with digital collections, and it generated lots of interesting and engaged discussion.

[It's interesting, we opened up the dataset from Çatalhöyük for download so that people could make their own interpretations and/or remix the data, but we never got around to implementing interfaces so people could contribute or upload the knowledge they created back to the project, or how to use the queries they'd run.]

Let's help our visitors get lost

In 'Community: From Little Things, Big Things Grow' on ALA, George Oates from Flickr says:

It's easy to get lost on Flickr. You click from here to there, this to that, then suddenly you look up and notice you've lost hours. Allow visitors to cut their own path through the place and they'll curate their own experiences. The idea that every Flickr visitor has an entirely different view of its content is both unsettling, because you can't control it, and liberating, because you’ve given control away. Embrace the idea that the site map might look more like a spider web than a hierarchy. There are natural links in content created by many, many different people. Everyone who uses a site like Flickr has an entirely different picture of it, so the question becomes, what can you do to suggest the next step in the display you design?

I've been thinking about something like this for a while, though the example I've used is Wikipedia. I have friends who've had to ban themselves from Wikipedia because they literally lose hours there after starting with one innocent question, then clicking onto an interesting link, then onto another…

That ability to lose yourself as you click from one interesting thing to another is exactly what I want for our museum sites: our visitor experience should be as seductive and serendipitous as browsing Wikipedia or Flickr.

And hey, if we look at the links visitors are making between our content, we might even learn something new about our content ourselves.

MultiMimsy database extractions and the possibilities for OAI-based collections repositories

I've uploaded my presentation slides from a talk for the UK MultiMimsy Users group in Docklands last month to MultiMimsy database extractions and the possibilities for OAI-based collections repositories at the Museum of London.

The first part discusses how to get from a set of data in a collections management system to a final published website, looking at the design process and technical considerations. Willoughby's use of Oracle on the back-end means that any ODBC-compliant database can query the underlying database and extract collections data.

The paper then looks at some of the possibilities for the Museum of London's OAI-PMH repository. We've implemented an OAI repository for the People's Network Discover Service (PNDS) for Exploring 20th Century London (which also means we're set to get records into Europeana), but I hope that we can use the repository in lots of other ways, including the possibility of using our repository to serve data for federated searches.

There's currently some discussion internationally in the cultural heritage sector about repositories vs federated search, but I'm not sure it's an either/or choice. The reasons each are used are often to do with political or funding factors instead of the base technology, but either method, or both, could be used internally or externally depending on the requirements of the project and institution.

I can go into more detail about the scripts we use to extract data from MultiMimsy or send sample scripts if people are interested. They might be a good way to get started if you haven't extracted data from MultiMimsy before but they won't generally be directly relevant to your data structres as the use of MultiMimsy can vary so widely between types of museums, collections and projects.

Calling geeks in the UK with an interest in cultural heritage content/audiences

You might be interested in BathCamp – a bar camp in Bath on a Saturday (with overnight stay) in late August. This is an initial open call so head along to the website (BathCamp) and check it out. Ideally you would have an interest in cultural heritage content, audiences or applications, but we love the idea of getting fresh perspectives from a wide range of people so we don't expect that you would have worked with the cultural heritage sector (museums, galleries, libraries, archives, archaeology) before.

How I do documentation: a column of bumph and a column of gold

All programmers hate documentation, right? But I've discovered a way to make it less painful and I'm posting in case it helps anyone else.

The first trick is to start documenting as soon as you start thinking about a project – well before you've written any code. I keep a running document of the work I've done, including the bits I'm about to try, information about links into other databases or applications, issues I need to think about or questions I need to ask someone, rude comments (I know, I look like such a nice girl), references, quick use cases, bits about functions, summary notes from meetings, etc.

Mostly I record by date, blog style. Doing it by date helps me link repository files, paper notes and emails with particular bits of work, which can otherwise be tricky if it's a while since you worked on a project or if you have lots of projects on the go. It's also handy if you need to record the time spent on different projects.

I just did it like this for a while, and it was ok, but I learnt the hard way that it takes a while to sort through it if I needed to send someone else some documentation. Then I made a conscious decision to separate the random musings from the decisions and notes on the productive bits of code.

So now my document has two columns. This first column is all the bumph described above – the stuff I'd need if I wanted to retrace my steps or remind myself why I ended up doing things a certain way. The second column records key decisions or final solutions. This is your column of gold.

This way I can quickly run down the items in the second column, organise it by area instead of by date and come up with some good documentation without much effort. And if I ever want to write up the whole project, I've got a record of the whole process in the column of bumph.

You could add a third column to record outstanding tasks or questions. I tend to mark these up with colour and un-colour them when they're done. It just depends how you like to work.

It's amazingly simple, but it works. I hope it might be useful for you too. Or if you have any better suggestions (or a better title for this post), I'd love to hear them.

Move your FAQ to Wikipedia?

Mal Booth from the Australian War Memorial (AWM) makes the fascinating suggestion: they should move their entire Encyclopaedia to Wikipedia. Their encyclopaedia seems to function as a fully researched and referenced FAQ with content creation driven by public enquiries, and would probably sit well in Wikipedia.

In Wikipedia and "produsers", Mal says:

"Putting the content up on Wikipedia.org gives it MUCH wider exposure than our website ever can and it therefore has the potential to bring new users to our website that may not even know we exist (via links in to our own web content). With a wikipedia.org user account, we can maintain an appropriate amount of control over the content (more than we have at present over wikipedia content that started as ours, already put up there by others).

Another point is that putting it up on Wikipedia allows us to engage the assistance of various volunteers who'd like to help us, but don't live locally."

He also presents some good suggestions from their web developer, Adam: they should understand and participate in the Wikipedia community, and identify themselves as AWM professionals before importing content. I think they've taken the first step by assessing the suitability of their content for Wikipedia.

It's also an interesting example of an organisation that is willing to 'let go' of their content and allow it to be used and edited outside their institution. Mal's blog is a real find (and I'm not just saying that because it has 'Melbin' (Melbourne) in the title), and I'll be following the progress of their project with interest.

I wonder how issues of trust and authority will play out on their entries: by linking to the relevant Wikipedia entries, the AWM is giving those entries a level of authority they might not otherwise have. They're also placing a great deal of trust in Wikipedia authors.

Mal links to a post by Alex Bruns, Beyond Public Service Broadcasting: Produsage at the ABC and summarises the four preconditions for good user-generated content:

the replacement of a hierarchy with a more open participatory structure;
recognising the power of the COMMUNITY to distinguish between constructive and destructive contributions;
allowing for random (granular, simple) acts of participation (like ratings); and
the development of shared rather than owned content that is able to be re-used, re-mixed or mashed up.

Adam's post lists key principles that anyone "looking to develop successful and sustainable participatory media environments" should take into account. These points are defined and expanded on in the original post, which is well worth reading:

Open Participation, Communal Evaluation
Fluid Heterarchy, Ad Hoc Meritocracy
Unfinished Artefacts, Continuing Process
Common Property, Individual Rewards