semantic web – Open Objects

Usability: the key that unlocks geeky goodness

This is a quick pointer to three posts about some usability work I did for the JISC-funded Pelagios project, and a reflection on the process. Pelagios aims to 'help introduce Linked Open Data goodness into online resources that refer to places in the Ancient World'. The project has already done lots of great work with the various partners to bring lots of different data sources together, but they wanted to find out whether the various visualisations (particularly the graph explorer) let users discover the full potential of the linked data sets.

I posted on the project blog about how I worked out a testing plan to encourage user-centred design and set up the usability sessions in Evaluating Pelagios' usability, set out how a test session runs (with sample scripts and tasks) in Evaluating usability: what happens in a user testing session? and finally I posted some early Pelagios usability testing results. The results are from a very small sample of potential users but they were consistent in the issues and positive results uncovered.

The wider lesson for LOD-LAM (linked open data in library, archives, museums) projects is that user testing (and/or a strong user-centred design process) helps general audiences (including subject specialists) appreciate the full potential of a technically-led project – without thoughtful design, the results of all those hours of code may go unloved by the people they were written for. In other words, user experience design is the key that unlocks the geeky goodness that drives these projects. It's old news, but the joy of user testing is that it reminds you of what's really important…

What would Phar Lap do? AKA, what happens when Facebook and museum URIs meet a dead horse?

Phar Lap was a famous race horse. After he died (in film-worthy suspicious circumstances), bits of Phar Lap ended up in three different museums – his skin is at Melbourne Museum, his skeleton is at Te Papa in Wellington, NZ, and his heart is in Canberra at the National Museum of Australia.

I've always been fascinated by the way the public respond to Phar Lap – when I worked at Museum Victoria, the outreach team would regularly get emails written to Phar Lap by people who had seen the film or somehow come across his story. (I was also never quite sure why they thought emailing a dead horse would work). So when I first heard that Phar Lap was on Facebook, I was curious to see which museum would have 'claimed' Phar Lap. Does possession of the most charismatic object (the hide) make it easier for Melbourne Museum to step up as the presence of Phar Lap on social media, or were they just the first to be in that space? The issues around 'ownership' and right to speak for an iconic object like Phar Lap make a brilliant case study for how museums represent their collections online.

And today, when I came across three posts (Responses to "Progress on Museum URIs", Progress on Museum URIs by @sebastianheath, Identifing Objects in Museum Collections by @ekansa) on movements towards stable museum URIs that problematised the "politics of naming and identifying cultural heritage" and the concept of the "exclusive right of museums to identify their objects", I thought of Phar Lap. (Which is nice, cos 80 years and one day ago he won the Melbourne Cup).

Of the three museums that own bits of the dead horse, which gets to publish the canonical digital record about Phar Lap? I hope the question sounds silly enough to highlight the challenges and opportunities in translating physical models to the digital realm. Of course each museum can publish a record (specifically, mint a URI) about Phar Lap (and I hope they do) but none of the museums could prevent the others from publishing (and hopefully they wouldn't want to).

Or as the various blog posts said, "many agents can assert an identity for an object, with those identities together forming a distributed and diverse commentary on the human past", and museums need to play their part: "a common identifier promoted by and discoverable at the holding institution will ease the process of recognizing that two or more identifiers refer to the 'same thing'".

Of course it's not that simple, and if you're interested in the questions the museum sector (by which I hopefully don't only mean me) is grappling with, the museums and the machine-processable web page on Permanent IDs has links to discussions on the MCG list, and I've wrestled a bit with how URIs might look at the Science Museum/NMSI (and I need to go back and review the comments left by various generous people). I'd love to know what other museums are planning, and what consumers of the data might need, so that we can come up with a robust common model for museum URIs.

And to reward you for getting this far, here is a picture of Phar Lap on Facebook as his skin and bones are about to be re-united:

Linking museums: machine-readable data in cultural heritage – meetup in London July 7

Somehow I've ended up organising an (very informal) event about 'Linking museums: machine-readable data in cultural heritage' on Wednesday, July 7, at a pub near Liverpool St Station. I have no real idea what to expect, but I'd love some feisty sceptics to show up and challenge people to make all these geeky acronyms work in the real museum world.

As I posted to the MCG list: "A very informal meetup to discuss 'Linking museums: machine-readable data in cultural heritage' is happening next Wednesday. I'm hoping for a good mix of people with different levels of experience and different perspectives on the issue of publishing data that can be re-used outside the institution that created it. … please do pass this on to others who may be interested. If you would like to come but can't get down to that London, please feel free to send me your questions and comments (or beer money)."

The basic details are: July 7, 2010, Shooting Star pub, London. 7:30 – 10pm-ish. More information is available at http://museum-api.pbworks.com/July-2010-meetup and you can let me know you're coming or register your interest.

In more detail…

Why?
I'm trying to cut through the chicken and egg problem – as a museum technologist, I can work towards getting machine-readable data available, but I'm not sure which formats and what data would be most useful for developers who might use it. Without a critical mass of take-up for any one type, the benefits of any one data source are more limited for developers. But museums seem to want a sense of where the critical mass is going to be so they can build for that. How do we cut through this and come up with a sensible roadmap?

Who?
You! If you're interested in using museum data in mashups but find it difficult to get started or find the data available isn't easily usable; if you have data you want to publish; if you work in a museum and have a
data publication problem you'd like help in solving; if you are a cheerleader for your favourite acronym…

Put another way, this event is for you if you're interested in publishing and sharing data about their museums and collections through technologies such as linked data and microformats.

It'll be pretty informal! I'm not sure how much we can get done but it'd be nice to put faces to names, and maybe start some discussions around the various problems that could be solved and tools that could be
created with machine-readable data in cultural heritage.

Some thoughts on linked data and the Science Museum – comments?

I've been meaning to finish this for ages so I could post it, but then I realised it's more use in public in imperfect form than in private, so here goes – my thoughts on linked data, APIs and the Science Museum on the 'Museums and the machine-processable web' wiki. I'm still trying to find time to finish documenting my thoughts, and I've already had several useful comments that mean I'll need to update it, but I'd love to hear your thoughts, comments, etc.

Final thoughts on open hack day (and an imaginary curatr)

I think hack days are great – sure, 24 hours in one space is an artificial constraint, but the sheer brilliance of the ideas and the ingenuity of the implementations is inspiring. They're a reminder that good projects don't need to take years and involve twenty circles of sign-off, even if that's the reality you face when you get back to the office.

I went because it tied in really well with some work projects (like the museum metadata mashup competition we're running later in the year or the attempt to get a critical mass of vaguely compatible museum data available for re-use) and stuff I'm interested in personally (like modern bluestocking, my project for this summer – let me know if you want to help, or just add inspiring women to freebase).

I'm also interested in creating something like a Dopplr for museums – you tell it what you're interested in, and when you go on a trip it makes you a map and list of stuff you could see while you're in that city.

Like: I like Picasso, Islamic miniatures, city museums, free wine at contemporary art gallery openings, [etc]; am inspired by early feminist history; love hearing about lived moments in local history of the area I'll be staying in; I'm going to Barcelona.

The 'list of cultural heritage stuff I like' could be drawn from stuff you've bookmarked, exhibitions you've attended (or reviewed) or stuff favourited in a meta-museum site.

(I don't know what you'd call this – it's like a personal butlr or concierge who knows both your interests and your destinations – curatr?)

The talks on RDFa (and the earlier talk on YQL at the National Maritime Museum) have inspired me to pick a 'good enough' protocol, implement it, and see if I can bring in links to similar objects in other museum collections. I need to think about the best way to document any mapping I do between taxonomies, ontologies, vocabularies (all the museumy 'ies') and different API functions or schemas, but I figure the museum API wiki is a good place to draft that. It's not going to happen instantly, but it's a good goal for 2009.

These are the last of my notes from the weekend's Open Hack London event, my notes from various talks are tagged openhacklondon.

Tom Morris, SPARQL and semweb stuff – tech talk at Open Hack London

Tom Morris gave a lightning talk on 'How to use Semantic Web data in your hack' (aka SPARQL and semantic web stuff).

He's since posted his links and queries – excellent links to endpoints you can test queries in.

Semantic web often thought of as long-promised magical elixir, he's here to say it can be used now by showing examples of queries that can be run against semantic web services. He'll demonstrate two different online datasets and one database that can be installed on your own machine.

First – dbpedia – scraped lots of wikipedia, put it into a database. dbpedia isn't like your averge database, you can't draw a UML diagram of wikipedia. It's done in RDF and Linked Data. Can be queried in a language that looks like SQL but isn't. SPARQL – is a w3c standard, they're currently working on SPARQL 2.

Go to dbpedia.org/sparql – submit query as post. [Really nice – I have a thing about APIs and platforms needing a really easy way to get you to 'hello world' and this does it pretty well.]

[Line by line comments on the syntax of the queries might be useful, though they're pretty readable as it is.]

'select thingy, wotsit where [the slightly more complicated stuff]'

Can get back results in xml, also HTML, 'spreadsheet', JSON. Ugly but readable. Typed.

[Trying a query challenge set by others could be fun way to get started learning it.]

One problem – fictional places are in Wikipedia e.g. Liberty City in Grand Theft Auto.

Libris – how library websites should be
[I never used to appreciate how much most library websites suck until I started back at uni and had to use one for more than one query every few years]

Has a query interface through SPARQL

Comment from the audience BBC – now have SPARQL endpoint [as of the day before? Go BBC guy!].

Playing with mulgara, open source java triple store. [mulgara looks like a kinda faceted search/browse thing] Has own query language called TQL which can do more intresting things than SPARQL. Why use it? Schemaless data storage. Is to SQL what dynamic typing is to static typing. [did he mean 'is to sparql'?]

Question from audence: how do you discover what you can query against?
Answer: dbpedia website should list the concepts they have in there. Also some documentation of categories you can look at. [Examples and documentation are so damn important for the update of your API/web service.]

Coming soon [?] SPARUL – update language, SPARQL2: new features

The end!

[These are more (very) rough notes from the weekend's Open Hack London event – please let me know of clarifications, questions, links or comments. My other notes from the event are tagged openhacklondon.

Quick plug: if you're a developer interested in using cultural heritage (museums, libraries, archives, galleries, archaeology, history, science, whatever) data – a bunch of cultural heritage geeks would like to know what's useful for you (more background here). You can comment on the #chAPI wiki, or tweet @miaridge (or @mia_out). Or if you work for a company that works with cultural heritage organisations, you can help us work better with you for better results for our users.]

There were other lightning talks on Pachube (pronounced 'patchbay', about trying to build the internet of things, making an API for gadgets because e.g. connecting hardware to the web is hard for small makers) and Homera (an open source 3d game engine).

Rasmus Lerdorf on Hacking with PHP – tech talk at Open Hack London

Same deal as my first post from today's Open Hack London event – these are (very) rough notes, please let me know of clarifications, questions or comments.

Hacking with PHP, Rasmus Lerdorf

Goal of talk: copy and pastable snippets that just work so you don't have to fight to get things that work [there's not enough of this to help beginners get over that initial hump]. The slides are available at http://talks.php.net/show/openhack and these notes are probably best read as commentary alongside the code examples.

[Since it's a hack day, some] Hack ideas: fix something you use every day; build your own targeted search engine; improve the look of search results; play with semantic web tools to make the web more semantic; tell the world what kind of data you have – if a resume, use hResume or other appropriate microformats/markup; go local – tools for helping your local community; hack for good – make the world a better place.

SearchMonkey and BOSS are blending together a little bit.

What we need to learn
With PHP – enough to handle simple requests; talk to backend datastore; how to parse XML with PHP, how to generate JSON, some basic javasccript, a JavaScript utility library like YUI or jquery.

parsing XML: simpleXML_load_file() – can load entire URL or local file.

Attributes on node show up as array. Namespace attributes call children of node, name namespace as argument.

Now know how to parse XML, can get lots of other stuff.
Context extraction service, Yahoo – doesn't get enough attention. Post all text, gives you back four or five key terms – can then do an image search off them. Or match ads to webpages.

Can use get or post (curl) – usually too much for get.

PHP to JavaScript on initial page load: JSON_encode -> javascript.

Javascript to PHP (and back)
If you can figure out these six lines of code, you can write anything in the world. How every modern web application works.
Server-side php, client-side javascript.

'There's nothing to building web applications, you just have to break everything down into small enough chunks that it all becomes trivial'.

AJAX in 30 seconds.
Inline comments in code would help for people reading it without hearing the talk at the same time.

JavaScript libraries to the rescue
load maps API, create container (div) for the map, then fill it.

Form – on submit call return updateMap(); with new location.

YGeoRSS – if have GeoRSS file… can point to it.

GeoPlanet – assigns a WOE ID to a place. Locations are more than just a lat long – carry way more information. Basically gives you a foreign key. YQL is starting to make the web a giant database. Can make joins across APIs – woeid works as fk.

YQL – 'combines all the APIs on the web into a single API'.

Add a cache – nice to YQL, and also good for demos etc. Copy and paste cache function from his slides – does a local cache on URL. Hashed with md5. Using PHP streams – #defn. Adding a cache speeds up developing when hacking (esp as won't be waiting for the wifi). [This is a pretty damn good tip cos it's really useful and not immediately obvious.]

XPath on URL using PHP's OAuth extension

SearchMonkey – social engineering people into caring about semantic data on the web. For non-geeks, search plug-in mechanism that will spruce up search results page. Encourages people to add semantic data so their search result is as sexy as their competitors – so goal is that people will start adding semantic data.

'If you're doing web stuff, and don't know about microformats, and your resume doesn't have hResume, you're not getting a job with Yahoo.'

Question: how are microformats different to RDFa?
Answer: there are different types of microformats – some very specific ones, eg hResume, hCal. RDFa – adding arbitrary tags to page. even if no specific way to describe your data. But there's a standard set of mark-ups for a resume so can use that. if your data doesn't match anything at microfomats.org then use RDFa or erdf (?).

RDFa, SearchMonkey – tech talks at Open Hack London

While today's Open Hack London event is mostly about the 24-hour hackathon, I signed up just for the Tech Talks because I couldn't afford to miss a whole weekend's study in the fortnight before my exams (stupid exams). I went to the sessions on 'Guardian Data Store and APIs', 'RDFa SearchMonkey', Arduino, 'Hacking with PHP', 'BBC Backstage', Dopplr's 'mashups made of messages' and lightning talks including 'SPARQL and semantic web' stuff you can do now.

I'm putting my rough and ready notes online so that those who couldn't make it can still get some of the benefits. Apologies for any mishearings or mistakes in transcription – leave me a comment with any questions or clarifications.

One of the reasons I was going was to push my thinking about the best ways to provide API-like access to museum information and collections, so my notes will reflect that but I try to generalise where I can. And if you have thoughts on what you'd like cultural heritage institutions to do for developers, let us know! (For background, here's a lightning talk I did at another hack event on happy museums + happy developers = happy punters).

RDFa – now everyone can have an API.
Mark Birkbeck

Going to cover some basic mark-up, and talk about why RDFa is a good thing. [The slides would be useful for the syntax examples, I'll update if they go online.]

RDFa is a new syntax from W3C – a way of embedding metadata (RDF) in HTML documents using attributes.

e.g. <span property="dc:title"> – value of property is the text inside the span.

Because it's inline you don't need to point to another document to provide source of metadata and presentation HTML.

One big advance is that can provide metadata for other items e.g. images, so you can e.g. attach licence info to the image rather than page it's in – e.g. <img src="" rel="licence" resource="[creative commons licence]">

Putting RDFa into web pages means you've now got a feed (the web page is the RSS feed), and a simple static web page can become an API that can be consumed in the same way as stuff from a big expensive system. 'Growing adoption'.

Government department Central Office of Information [?] is quite big on RDFa, have a number of projects with it. [I'd come across the UK Civil Service Job Service API while looking for examples for work presentations on APIs.]

RDFa allows for flexible publishing options. If you're already publishing HTML, you can add RDFa mark-up then get flexible publishing models – different departments can keep publishing data in their own way, a central website can go and request from each of them and create its own database of e.g. jobs. Decentralised way of approaching data distribution.

Can be consumed by: smarter browsers; client-side AJAX, other servers such as SearchMonkey.

He's interested where browsers can do something with it – either enhanced browsers that could e.g. store contact info in a page into your address book; or develop JavaScript libraries that can parse page and do something with it. [screen shot of jobs data in search monkey with enhanced search results]

RDFa might be going into Drupal core.

Example of putting isbn in RDFa in page, then a parser can go through the page, pull out the triples [some explanation of them as mini db?], pull back more info about the book from other APIs e.g. Amazon – full title, thumbnail of cover. e.g. pipes.

Example of FOAF – twitter account marked up in page, can pull in tweets. Could presumably pull in newer services as more things were added, without having to re-mark-up all the pages.

Example of chemist writing a blog who mentions a chemical compound in blog post, a processor can go off and retrieve more info – e.g. add icon for mouseover info – image of molecule, or link to more info.

Next plan is to link with BOSS. Can get back RDFa from search results – augment search results with RDFa from the original page.

Search Monkey (what it is and what you can do with it)
Neil Crosby (European frontend architect for search at Yahoo).

SearchMonkey is (one of) Yahoo's open search platforms (along with BOSS). Uses structured data to enhance search results. You get to change stuff on Yahoo search results page.

SearchMonkey lets you: style results for certain URL patterns; brand those results; make the results more useful for users.

[examples of sites that have done it to see how their results look in Yahoo? I thought he mentioned IMDb but it doesn't look any different – a film search that returns a wikipedia result, OTOH, does.]

Make life better for users – not just what Yahoo thinks results should be, you can say 'actually this is the important info on the page'

Three ways to do it [to change the SERP [search engine results page]: mark up data in a way that Yahoo knows about – 'just structure your data nicely'. e.g. video mark-up; enhance a result directly; make an infobar.

Infobar – doesn't change result see immediately on the page, but it opens on the page. e.g. of auto-enhanced result- playcrafter. Link to developer start page – how to mark it up, with examples, and what it all means.

User-enhanced result – Facebook profile pages are marked up with microformats – can add as friend, poke, send message, view friends, etc from the search results page. Can change the title and abstract, add image, favicon, quicklinks, key/value pairs. Create at [link I can't see but is on slides] Displayed in screen, you fill it out on a template.

Infobar – dropdown in grey bar under results. Can do a lot more, as it's hidden in the infobar and doesn't have to worry people.

Data from: microformats, RDF, XSLT, Yahoo's index, and soon, top tags from delicious.

If no machine data, can write an XSLT. 'isn't that hard'. Lots of documentation on the web.

Examples of things that have been made – a tool that exposes all the metadata known for a page. URL on slide. can install on Yahoo search page, add it in. Use location data to make a map – any page on web with metadata about locations on it – map monkey. Get qype results for anything you search for.

There's a mailing list (people willing and wanting to answer questions) and a tutorial.

Questions

Question: do you need to use a special doctype [for RDFa]?
Answer: added to spec that 'you should use this doctype' but the spec allows for RDFa to be used in situations when can't change doctype e.g. RDFa embedded in blogger blogpost. Most parsers walk the DOM rather than relying on the doctype.

Jim O'D – excited that SearchMonkey supports XSLT – if have website with correctly marked up tables, could expose those as key/value pairs?
Answer: yes. XSLT fantastic tool for when don't have data marked up – can still get to it.

Frankie – question I couldn't hear. About info out to users?
Answer: if you've built a monkey, up to you to tell people about it for the moment. Some monkeys are auto-on e.g. Facebook, wikipedia… possibly in future, if developed a monkey for a site you own, might be able to turn it auto-on in the results for all users… not sure yet if they'll do it or not.
Frankie: plan that people get monkeys they want, or go through gallery?
Answer: would be fantastic if could work out what people are using them for and suggest ones appropriate to people doing particular kinds of searches, rather than having to go to a gallery.

Tim Berners-Lee at TED on 'database hugging' and linked data

This TED talk by Tim Berners-Lee: The next Web of open, linked data is worth watching if you've been 'wondering whatever happened to the semantic web?', or what this 'linked data' is about all.

I've put some notes below – I was transcribing it for myself and thought I might as well share it. It's only a selection of the talk and I haven't tidied it because they're not my words to edit.

Why is linked data important?

Making the world run better by making this data available. If you know about some data in some government department you often find that, these people, they're very tempted to keep it, to hug your database, you don't want to let it go until you've made a beautiful website for it. … Who am I to say "don't make a website…" make a beautiful website, but first, give us the unadulterated data. Give us the raw data now.

You have no idea, the number of excuses people come up with to hang onto their data and not give it to you, even though you've paid for it.

Communicating science over the web… the people who are going to solve those are scientists, they have half-formed ideas in their head, but a lot of the state of knowledge of the human race at the moment is in database, currently not sharing. Alzheimer's scientists … the power of being able ask questions which bridge across different disciplines is really a complete sea-change, it's very, very important. Scientists are totally stymied at the moment, the power of the data that other scientists have collected is locked up, and we need to get it unlocked so we can tackle those huge problems. if I go on like this you'll think [all data from] huge institutions but it's not. [Social networking is data.]

Linked data is about people doing their bit to produce their bit, and it all connecting. That's how linked data works. … You do your bit, everybody else does theirs. You may not have much data yourself, to put on there, but you know to demand it.

It's not just about the number of places where data comes. It's about connecting it together. When you connect it together you get this power… out of it. It'll only really pay off when everybody else has done it. It's called Linked Data, I want you to make it, I want you to demand it.

User-generated mashups in natural language?

In case you missed it elsewhere, check out Mozilla Lab's video and blog post on Introducing Ubiquity – 'An experiment into connecting the Web with language'.

It's a framework that brings together lots of the bits of functionality that are available with browser extensions and bookmarklets and lets the user run them with natural language commands. One of the goals is to "enable on-demand, user-generated mashups with existing open Web APIs. (In other words, allowing everyone–not just Web developers–to remix the Web so it fits their needs, no matter what page they are on, or what they are doing.)".

It's a long way from being ubiquitous, but it does show that it's increasingly worth publishing your data in re-usable formats. They show an example of address being picked up from microformats in apartment listings and mapped for the user – that kind of mashup was possible before and they're a huge step forward in themselves, but how many users have the skills and time to do it? Being able to use natural language to pull together and use data could bring mash-ups to the general public in a massive way.