APIs – Page 2 – Open Objects

'Cosmic Collections' launches at the Science Museum this weekend

I think I've already said pretty much everything I can about the museum website mashup competition we're launching around the 'Cosmos and Culture' exhibition, but it'd be a bit silly of me not to mention it here since the existence and design of the project reflects a lot of the issues I've written about here.

If you make it along to the launch at the Science Museum on Saturday, make sure you say hello – I should be easy to find cos I'm giving a quick talk at some point.

Right now the laziest thing I could do is to give you a list of places where you can find out more:

You can RSVP at eventbrite or simply find out more about the Cosmic Collections launch event and about the mashup competition.
You can read 'A new API and hack competition – this time not from a tech company but by a museum!', an interview with Chris Heilmann on the Yahoo Developer Network blog.
There are two separate interviews with me, 'Cosmic Collections: the geeky stuff' and Ali Boyle, the Curator of Astronomy, 'Background on our Cosmos & Culture exhibition'. (My apologies to the readers of the collections blog for my nerdtastic interruption. And that was me trying to speak like a normal person – tragic, really.)
You can also ask questions about it, connect with other participants, share tips, etc on the competition wiki.

Finally, you can talk to us @coscultcom on twitter, or tag content with #coscultcom.

Btw – if you want an idea of how slowly museums move, I think I first came up with the idea in January (certainly before dev8D because it was one of the reasons I wanted to go) and first blogged about it (I think) on the museum developers blog in March. The timing was affected by other issues, but still – it's a different pace of life!

Museum pecha kucha night

The first museum pecha kucha night was held in London at the British Museum on June 18, 2009. I took rough notes during the presentations, and have included the slides and notes from my own presentation. The event used the tag 'mwpkn' to gather together tweets, photos, etc. The focus of this first museum pecha kucha was on sharing insights and inspiration from the Museums and the Web conference held in Indianapolis in April.

The event was organised by Shelley Mannion, who introduced the event, emphasising that it was about fun and connecting the museum tech community in an interesting way.

Gail Durbin (V&A), takeaways from MW2009
She's a practical person, looks for ideas to nick. Good idea as things get hazy after a conference, good intentions disappear.

First takeaway – Dina Helal let her play with her iPhone, decided she had to have one. She liked her mobile for the first time in her life.

Second – twittering was very important. Decided to do something with it. Twittering is hard, sending out messages that are interesting is difficult.

Enthusiasm at conferences is short lived – e.g. people excited about wedding site, but did they send in wedding photos? She talked to people about a self-portraiture idea, 'life on a postcard', but hasn't had a single response.

RSS feeds – came away knowing we had to review our RSS feeds, had been without attention for a long time.

Learnt that wikis are very hard work, they don't automatically look after themselves.
Creative use of Flickr – museum 'my karsh' collection

Resolved that had to work with Development. Looking at something like the British Library's – adopt a book for fathers day.

Something that bothers her – many museums think of 'Web 2.0' just as more channels to push out information, there's no sense of pulling in information about visitors.

Beck Tench, one of the most interesting people she met at the conference – practice and work go together very closely. Flickr plant project. She wants to get staff involved – has meeting on Fridays, in local bar, tweets to everyone, conducts something called Experimonth.

Last thing learnt – librarians have better cakes.

Silvia Filippini Fantoni (British Museum and Sorbonne University)
Silvia makes a plea for extra seconds as a non-native speaker (and synthesis not the best feature of Italians). Lecturer in museum informatics and evaluation methods at Sorbonne and project manager for multimedia guide project at British Museum.

So her focus at the conference was mostly on guides. Particularly Samis and Pau and others. Mini workshops and workshops on the topic before and during the conference. Demos from Paul Clifford (Museum of London). Exhibitors. Lots of museums are planning to develop applications.

Interest in using mobile technology as an interpretive tool is constantly growing, especially delivered on visitors own devices. Proliferations of mobile platforms. Proliferation of different functionalities – not just audio – visual, games, way finding, web access and communication, notes and comments. Have all these new platforms and functionalities improved the visitor experience? Yes, but there are some disadvantages.

Asks: aren't we trying to do too much? Are we trying to turn a useful interpretive tool into something too complex? Aren't we forgetting about core audio guide audience?

Are people interested in using their own devices? Do they have the time to pre-download, do they bring their devices? Samis and Pau – the answer is no/not yet. For the medium and short term still need to provide media in the museums. Touch screen devices are easier to use. Limited functionality makes interface simpler. Focus on content – AV messages, touch and listen.
Importance of sharing and learning from best practice. Some efforts at and after MW2009 – handheldconference.org. Discussion of developing open source content management system for mobile devices – contact Nancy Proctor.

Daniel Incandela (Indianapolis Museum of Art)
He's from America so should have extra time too. Also sick and medicated (so at least one of us will have a good time during the presentation).

Enjoys robots, dinosaurs, football and a good point. On holiday while here.

Slide – Shelley's twitter profile – she's responsible for him being here while on holiday.

He blogged about preparing for the presentation and got a comment from one of the pecha kucha founders – the main thing is to have fun, be passionate about something you love.

Twitterfall on the big screen was a major breakthrough at MW2009, (#mw2009 trended as a topic and attracted the attention of) pantygirl.

Digital story telling and tech can't happen without support, Max Anderson has been dream leader.

He's here representing IMA so going to showcase some projects – Roman Art from Louvre webisodes – paved the way for informal, agile, multiple content source creation.

Art Babble. IMA blog – ripped off other museums – gives many departments from museum a digital voice.

Half time experiment with awkward silence (blank slide). [In the pub afterwards, I discovered that this actually made at least one of the English people feel socially awkward!]

Brooklyn Museum – for him the real innovators for digital content for museums, won many awards at MW2009.

Te Papa's 'build a squid' had him at 'hello'. First example of a museum project that actually went viral?

Perhaps we could upgrade MW site? Better integration of social media, multimedia from previous conferences.

Loves Bruce Wyman – reason to go to MW2010.

art:21 – smart team, good approaches to publishing across platforms.

Wonders about agility – love new and emerging projects (?) we hear about at conferences, but how do we face an idea and deal with own internal issues?

The Dutch at Indy (were great) – but somewhere outside north America next for Museums and the Web?

Philip Poole (British Museum)
Everything I got from MW2009 can be put into one statement – spread it about. Enable your content to be spread by other people through APIs.

Does spreading out content dilute our authority? By putting it onto other websites, putting it in contact with other people. No, of course not.

Video was big at MW2009.

If going to use different platforms, will people come? We need to tailor content to different websites – can't just build it and assume people will come. Persian coins vs. ritual Mayan sacrifice on YouTube – which will get bigger audience? [Pick content delivery to suit audience and context.]

Platforms include ArtBabble, YouTube (shorter, edgier), iTunes U. Viral content – we can put features on our website, but a YouTube or Vimeo audience are going to spread things better. iTunes, U, can download and listen on train – takes out of website entirely.

Stats are important – e.g. need to include stats of video on different platforms, make sure people above you recognise the value in that. DCMS – very basic stats – perhaps they should be asking for different stats. "If DCMS ask how much video we put on YouTube, we'd all start doing it." [Brilliant point]

API – take content from website and put elsewhere. IMA Explore section – advertise the repeating pattern in their URLs – someone used them but wasn't going very well, they got in contact with him and helped him succeed, now biggest referrer outside search engines. He wants to do that for the British Museum – he knows the quirks, the data.

Why the 'softly softly' approach? Creating an entire API interface is huge mountain, people above you will want to avoid it if you show them the size of the whole mountain.

Digital NZ – fantastic example. Can create custom search, embed on website, also into gallery and people can vote for it

The British Museum is a museum of the world for the world, why should their web presence be any different?

Mia Ridge (Science Museum)
Yes, that's me. My slides on 'Bubbles and Easter eggs – Museum Pecha Kucha' are on slideshare – scroll down the page for full text and notes – or available as a PDF (2mb).

I talked about:

keeping the post-conference momentum going, particularly the 'do one thing' idea;
museum technologists as 'double domain experts';
not hiding museum geeks like Easter eggs but making more of them as a resource;
the responsibilities of museum geeks as their expertise is recognised;
breaking down internal silos; intelligent failure;
broken metrics and better project design (pitch the goal, not the method);
audience expectations in 2009;
possible first questions for digital projects and taking a whole museum view for new projects;
who's talking/listening to your audiences? trust and respect your audiences;
your museum is an iceberg (lots of the good stuff is hidden);
(s)mash the system (hold a mashup day);
and a challenge for your museum – has the web fundamentally changed your organisation?

Frankie Roberto (Rattle)
Went to the conference with a 'fan' hat on, just really enjoys museums. Loved the zoo – live exhibits are interactive, visceral. Role of live interpretation – how could it work with digital technology? Everyone loves dinosaur – Indy Children's Museum. All museums should have a carousel (can't remember what he was going to say about it).

The Power of Children; making a difference – really powerful stories.

Still thinking about the idea of creating visceral experiences.

ArtBabble – shouldn't generally create silos but ArtBabble spotted that YouTube wasn't working for certain types of content.

Davis LAB – kiosks and sofa. Said 'we are on the web'.

Drupal – lots of museums switching to it.

Richard Morgan (V&A) on APIS – ask, what is your museum good at?, and build an API for that – it may not be collections stuff.

'Things to do' page on V&A. Good way of highlighting ways to interact on website.

Semantic data, Aaron's talk on interpretation of bias, relocation from Flickr photos.
Breaking down ideas about authority on where an area is bounded by. OpenStreetMap – wants to add a historical layer to that so can scroll backwards and forwards in time. [I should ask whether this means layering old maps (with older street layouts like pre-Great Fire of London, or earlier representations?). Geo-rectification is expensive because it's time-consuming, but could it be crowdsourced? Geo-locating old images would be easier for the average person to do.]

Open Plaques – alpha project.

Thinks we won't need to digitise in the future as stuff will be born digital (ha, as if! Though it depends where you draw the lines about the end of collections – in my imagination they're like that warehouse scene at the end of Indiana Jones and the Raiders of the Lost Arc and we won't run out of things to properly digitise any time soon. Still, it's a useful question.)

Dan Zambonini (Box UK)
'Every film needs a villain'. In his impressions and insights from MW2009 he'll say things we may or may not agree with.

Slide – stuff we can do vs. stuff we can't do on either side of a gulf of perceived complexity. It's hard to progress from one to the other. Three questions to bridge gap – how to make relevant to everyday job, how to show advantages, how to make it easy.

Then he realised should talk about personal things – people and connections made. About people, stuff that happens in the evening. The evening drinks don't happen at UKMW – it's a shame we have to go to the other side of the world to talk to each other. [It does it you're at an event like mashed museum the day before – another reason to open it up to educators, curators, etc.]

Small museums vs. big museums – [should make stuff accessible to small museums.] Can get value by helping people. (He tells his ex-girlfriend that ) small is the new big. Also small quick wins. Break down the big things into smaller things, find ways can get to them through small changes in behaviour, bits of information.

How small is small? Greater or less than one day. If less than a day, might as well try it. If it's going to take a week, not small.

Museums should share data – not just as API – share data on traffic, spill gossip on marketing costs, etc. [Information is power, etc]

Celebrate failure – admit that some things go wrong.

Bigger picture – be honest. Tell us when to shut up (on e.g. the

If not on twitter, get on it. The more people talking to each other, the more powerful we are as a group. [But what happens if you miss a few days of twitter? I like twitter, but it's inaccessible if you don't have time to constantly keep up, or don't have a computer at home. Still, getting more people talking is an excellentbl point, even if twitter itself doesn't work for some people.]

The sector is missing practical, specific blog, not news and opinions. [Do collections system specific user groups take the place of blogs?]

Use grants to innovate and produce open source stuff. Right now private agencies will take a lot of the strain of applying for grants.

Sort out that copyright stuff. How difficult can it be?

Final slide summing up and last bit of innuendo. 'Beer makes you more attractive' – it's the after sessions stuff at conferences that's so valuable.

Frankie, Dan and Daniel's slides are also available in the 'Museum Tech Pecha Kucha' event on slideshare (and mine has now got an audio track, thanks to Shelley).

Final thoughts on open hack day (and an imaginary curatr)

I think hack days are great – sure, 24 hours in one space is an artificial constraint, but the sheer brilliance of the ideas and the ingenuity of the implementations is inspiring. They're a reminder that good projects don't need to take years and involve twenty circles of sign-off, even if that's the reality you face when you get back to the office.

I went because it tied in really well with some work projects (like the museum metadata mashup competition we're running later in the year or the attempt to get a critical mass of vaguely compatible museum data available for re-use) and stuff I'm interested in personally (like modern bluestocking, my project for this summer – let me know if you want to help, or just add inspiring women to freebase).

I'm also interested in creating something like a Dopplr for museums – you tell it what you're interested in, and when you go on a trip it makes you a map and list of stuff you could see while you're in that city.

Like: I like Picasso, Islamic miniatures, city museums, free wine at contemporary art gallery openings, [etc]; am inspired by early feminist history; love hearing about lived moments in local history of the area I'll be staying in; I'm going to Barcelona.

The 'list of cultural heritage stuff I like' could be drawn from stuff you've bookmarked, exhibitions you've attended (or reviewed) or stuff favourited in a meta-museum site.

(I don't know what you'd call this – it's like a personal butlr or concierge who knows both your interests and your destinations – curatr?)

The talks on RDFa (and the earlier talk on YQL at the National Maritime Museum) have inspired me to pick a 'good enough' protocol, implement it, and see if I can bring in links to similar objects in other museum collections. I need to think about the best way to document any mapping I do between taxonomies, ontologies, vocabularies (all the museumy 'ies') and different API functions or schemas, but I figure the museum API wiki is a good place to draft that. It's not going to happen instantly, but it's a good goal for 2009.

These are the last of my notes from the weekend's Open Hack London event, my notes from various talks are tagged openhacklondon.

Tom Morris, SPARQL and semweb stuff – tech talk at Open Hack London

Tom Morris gave a lightning talk on 'How to use Semantic Web data in your hack' (aka SPARQL and semantic web stuff).

He's since posted his links and queries – excellent links to endpoints you can test queries in.

Semantic web often thought of as long-promised magical elixir, he's here to say it can be used now by showing examples of queries that can be run against semantic web services. He'll demonstrate two different online datasets and one database that can be installed on your own machine.

First – dbpedia – scraped lots of wikipedia, put it into a database. dbpedia isn't like your averge database, you can't draw a UML diagram of wikipedia. It's done in RDF and Linked Data. Can be queried in a language that looks like SQL but isn't. SPARQL – is a w3c standard, they're currently working on SPARQL 2.

Go to dbpedia.org/sparql – submit query as post. [Really nice – I have a thing about APIs and platforms needing a really easy way to get you to 'hello world' and this does it pretty well.]

[Line by line comments on the syntax of the queries might be useful, though they're pretty readable as it is.]

'select thingy, wotsit where [the slightly more complicated stuff]'

Can get back results in xml, also HTML, 'spreadsheet', JSON. Ugly but readable. Typed.

[Trying a query challenge set by others could be fun way to get started learning it.]

One problem – fictional places are in Wikipedia e.g. Liberty City in Grand Theft Auto.

Libris – how library websites should be
[I never used to appreciate how much most library websites suck until I started back at uni and had to use one for more than one query every few years]

Has a query interface through SPARQL

Comment from the audience BBC – now have SPARQL endpoint [as of the day before? Go BBC guy!].

Playing with mulgara, open source java triple store. [mulgara looks like a kinda faceted search/browse thing] Has own query language called TQL which can do more intresting things than SPARQL. Why use it? Schemaless data storage. Is to SQL what dynamic typing is to static typing. [did he mean 'is to sparql'?]

Question from audence: how do you discover what you can query against?
Answer: dbpedia website should list the concepts they have in there. Also some documentation of categories you can look at. [Examples and documentation are so damn important for the update of your API/web service.]

Coming soon [?] SPARUL – update language, SPARQL2: new features

The end!

[These are more (very) rough notes from the weekend's Open Hack London event – please let me know of clarifications, questions, links or comments. My other notes from the event are tagged openhacklondon.

Quick plug: if you're a developer interested in using cultural heritage (museums, libraries, archives, galleries, archaeology, history, science, whatever) data – a bunch of cultural heritage geeks would like to know what's useful for you (more background here). You can comment on the #chAPI wiki, or tweet @miaridge (or @mia_out). Or if you work for a company that works with cultural heritage organisations, you can help us work better with you for better results for our users.]

There were other lightning talks on Pachube (pronounced 'patchbay', about trying to build the internet of things, making an API for gadgets because e.g. connecting hardware to the web is hard for small makers) and Homera (an open source 3d game engine).

Mashups made of messages – tech talk at Open Hack London

More (very) rough notes from the weekend's Open Hack London event – please let me know of clarifications, questions, links or comments. You can also check out other posts here tagged openhacklondon.

Mashups made of messages, Matt Biddulph (Dopplr)

Systems architecture on Doppler lets them combine 3rd party systems with their stuff without tying their servers up in knots.

At a rough count, Dopplr uses about 25 third party web APIs.

If you're going to make a web service, site, concentrate on the stuff you're good at. [Use what other people are good at to make yours ace.]

But this also means you're outsourcing and part of your reliability to other people. For each bit of service you add, network latency [is?] putting another bit of risk into your web architecture. Use messaging systems to make server side stuff asynchronous.

'&' is his favourite thing about Linux. Fundamental in Unix that work is divided into packets; each doing the thing it does well. Not even very tightly coupled. Anything that can be run on the command line, stick & on the end, do it in the background. Can forget about things running in the background – don't have to manage the processes, it's not tightly coupled.

Nothing in web apps is simple these days – lots of interconnected bits.

In the physical world, big machines use gearing – having different bits of system run at different speeds. Also things can freewheel then lock in to system again when done.

When building big systems, there's a worry that one machine, one bit it depends on can bring down everything else.

[Slide of a] Diagram of all the bits of the system that don't run because someone has sent an HTTP request – [i.e. background processes]

Flickr is doing less database work up front to make pages load as quickly as possible. They queue other things in the background. e.g. photos load, tags added slightly later. (See post 'Flickr engineers do it offline'.)

Enterprise Integration Patterns (Hohpe et al) is a really good book. Banks have been using messaging for years to manage the problems. Atomic packets of data can be sent on a channel – 'Email for applications'.

Designing – think about what needs to be done now, what can be done in the background? Think of it as part of product design – what has instant effect, what has slower effect? Where can you perform the 'sleight of hand' without people noticing/impacting their user experience?

Example using web services 1: Dopplr and AMEE. What happens when someone asks to see their carbon impact? A request for carbon data goes to Ruby on Rails (memory hungry, not the fastest thing in the world, try to take things off that and process elsewhere). Refresh user screen 'check back soon', send request to message broker (in JSON). Worker process connected to message broker sends request to AMEE. Update database.

Example using web services 2: Flickr pictures on Dopplr page. When you request a trip page, the page loads with all usual stuff and empty div in page with a piece of Javascript on a timer that polls Flickr.

Keeps open connection, a way to push messages to the client while it's waiting to do something.

When processing lots of stuff, worker processes write to memcache as a form of progress bar, but the process is actually disconnected from the webserver so load/risk is outsourced.

'Sites built with glue and string don't automatically scale for free.' You can have many webservers, but the bottleneck might be in the database. Splitting work into message queues is a way of building so things can scale in parallel.

Slide of services, companies that offer messaging stuff. [Did anyone get a photo of that?]

Because of abstraction and with things happening in the background, it's a different flow of control than you might be used to – monitoring is different. You can't just sit there with a single debugger.

[Slide] "If you can't see your changes take effect in a system your understanding of cause and effect breaks down" – not just about it being hard to debug, it's also about user expectations.

I really liked this presentation – it's always good to learn from people who are not only innovating, but are also really solid on performance and reliability as well as the user experience.

[Update: a version of this talk is on the Dopplr blog with slides and notes.]

Rasmus Lerdorf on Hacking with PHP – tech talk at Open Hack London

Same deal as my first post from today's Open Hack London event – these are (very) rough notes, please let me know of clarifications, questions or comments.

Hacking with PHP, Rasmus Lerdorf

Goal of talk: copy and pastable snippets that just work so you don't have to fight to get things that work [there's not enough of this to help beginners get over that initial hump]. The slides are available at http://talks.php.net/show/openhack and these notes are probably best read as commentary alongside the code examples.

[Since it's a hack day, some] Hack ideas: fix something you use every day; build your own targeted search engine; improve the look of search results; play with semantic web tools to make the web more semantic; tell the world what kind of data you have – if a resume, use hResume or other appropriate microformats/markup; go local – tools for helping your local community; hack for good – make the world a better place.

SearchMonkey and BOSS are blending together a little bit.

What we need to learn
With PHP – enough to handle simple requests; talk to backend datastore; how to parse XML with PHP, how to generate JSON, some basic javasccript, a JavaScript utility library like YUI or jquery.

parsing XML: simpleXML_load_file() – can load entire URL or local file.

Attributes on node show up as array. Namespace attributes call children of node, name namespace as argument.

Now know how to parse XML, can get lots of other stuff.
Context extraction service, Yahoo – doesn't get enough attention. Post all text, gives you back four or five key terms – can then do an image search off them. Or match ads to webpages.

Can use get or post (curl) – usually too much for get.

PHP to JavaScript on initial page load: JSON_encode -> javascript.

Javascript to PHP (and back)
If you can figure out these six lines of code, you can write anything in the world. How every modern web application works.
Server-side php, client-side javascript.

'There's nothing to building web applications, you just have to break everything down into small enough chunks that it all becomes trivial'.

AJAX in 30 seconds.
Inline comments in code would help for people reading it without hearing the talk at the same time.

JavaScript libraries to the rescue
load maps API, create container (div) for the map, then fill it.

Form – on submit call return updateMap(); with new location.

YGeoRSS – if have GeoRSS file… can point to it.

GeoPlanet – assigns a WOE ID to a place. Locations are more than just a lat long – carry way more information. Basically gives you a foreign key. YQL is starting to make the web a giant database. Can make joins across APIs – woeid works as fk.

YQL – 'combines all the APIs on the web into a single API'.

Add a cache – nice to YQL, and also good for demos etc. Copy and paste cache function from his slides – does a local cache on URL. Hashed with md5. Using PHP streams – #defn. Adding a cache speeds up developing when hacking (esp as won't be waiting for the wifi). [This is a pretty damn good tip cos it's really useful and not immediately obvious.]

XPath on URL using PHP's OAuth extension

SearchMonkey – social engineering people into caring about semantic data on the web. For non-geeks, search plug-in mechanism that will spruce up search results page. Encourages people to add semantic data so their search result is as sexy as their competitors – so goal is that people will start adding semantic data.

'If you're doing web stuff, and don't know about microformats, and your resume doesn't have hResume, you're not getting a job with Yahoo.'

Question: how are microformats different to RDFa?
Answer: there are different types of microformats – some very specific ones, eg hResume, hCal. RDFa – adding arbitrary tags to page. even if no specific way to describe your data. But there's a standard set of mark-ups for a resume so can use that. if your data doesn't match anything at microfomats.org then use RDFa or erdf (?).

RDFa, SearchMonkey – tech talks at Open Hack London

While today's Open Hack London event is mostly about the 24-hour hackathon, I signed up just for the Tech Talks because I couldn't afford to miss a whole weekend's study in the fortnight before my exams (stupid exams). I went to the sessions on 'Guardian Data Store and APIs', 'RDFa SearchMonkey', Arduino, 'Hacking with PHP', 'BBC Backstage', Dopplr's 'mashups made of messages' and lightning talks including 'SPARQL and semantic web' stuff you can do now.

I'm putting my rough and ready notes online so that those who couldn't make it can still get some of the benefits. Apologies for any mishearings or mistakes in transcription – leave me a comment with any questions or clarifications.

One of the reasons I was going was to push my thinking about the best ways to provide API-like access to museum information and collections, so my notes will reflect that but I try to generalise where I can. And if you have thoughts on what you'd like cultural heritage institutions to do for developers, let us know! (For background, here's a lightning talk I did at another hack event on happy museums + happy developers = happy punters).

RDFa – now everyone can have an API.
Mark Birkbeck

Going to cover some basic mark-up, and talk about why RDFa is a good thing. [The slides would be useful for the syntax examples, I'll update if they go online.]

RDFa is a new syntax from W3C – a way of embedding metadata (RDF) in HTML documents using attributes.

e.g. <span property="dc:title"> – value of property is the text inside the span.

Because it's inline you don't need to point to another document to provide source of metadata and presentation HTML.

One big advance is that can provide metadata for other items e.g. images, so you can e.g. attach licence info to the image rather than page it's in – e.g. <img src="" rel="licence" resource="[creative commons licence]">

Putting RDFa into web pages means you've now got a feed (the web page is the RSS feed), and a simple static web page can become an API that can be consumed in the same way as stuff from a big expensive system. 'Growing adoption'.

Government department Central Office of Information [?] is quite big on RDFa, have a number of projects with it. [I'd come across the UK Civil Service Job Service API while looking for examples for work presentations on APIs.]

RDFa allows for flexible publishing options. If you're already publishing HTML, you can add RDFa mark-up then get flexible publishing models – different departments can keep publishing data in their own way, a central website can go and request from each of them and create its own database of e.g. jobs. Decentralised way of approaching data distribution.

Can be consumed by: smarter browsers; client-side AJAX, other servers such as SearchMonkey.

He's interested where browsers can do something with it – either enhanced browsers that could e.g. store contact info in a page into your address book; or develop JavaScript libraries that can parse page and do something with it. [screen shot of jobs data in search monkey with enhanced search results]

RDFa might be going into Drupal core.

Example of putting isbn in RDFa in page, then a parser can go through the page, pull out the triples [some explanation of them as mini db?], pull back more info about the book from other APIs e.g. Amazon – full title, thumbnail of cover. e.g. pipes.

Example of FOAF – twitter account marked up in page, can pull in tweets. Could presumably pull in newer services as more things were added, without having to re-mark-up all the pages.

Example of chemist writing a blog who mentions a chemical compound in blog post, a processor can go off and retrieve more info – e.g. add icon for mouseover info – image of molecule, or link to more info.

Next plan is to link with BOSS. Can get back RDFa from search results – augment search results with RDFa from the original page.

Search Monkey (what it is and what you can do with it)
Neil Crosby (European frontend architect for search at Yahoo).

SearchMonkey is (one of) Yahoo's open search platforms (along with BOSS). Uses structured data to enhance search results. You get to change stuff on Yahoo search results page.

SearchMonkey lets you: style results for certain URL patterns; brand those results; make the results more useful for users.

[examples of sites that have done it to see how their results look in Yahoo? I thought he mentioned IMDb but it doesn't look any different – a film search that returns a wikipedia result, OTOH, does.]

Make life better for users – not just what Yahoo thinks results should be, you can say 'actually this is the important info on the page'

Three ways to do it [to change the SERP [search engine results page]: mark up data in a way that Yahoo knows about – 'just structure your data nicely'. e.g. video mark-up; enhance a result directly; make an infobar.

Infobar – doesn't change result see immediately on the page, but it opens on the page. e.g. of auto-enhanced result- playcrafter. Link to developer start page – how to mark it up, with examples, and what it all means.

User-enhanced result – Facebook profile pages are marked up with microformats – can add as friend, poke, send message, view friends, etc from the search results page. Can change the title and abstract, add image, favicon, quicklinks, key/value pairs. Create at [link I can't see but is on slides] Displayed in screen, you fill it out on a template.

Infobar – dropdown in grey bar under results. Can do a lot more, as it's hidden in the infobar and doesn't have to worry people.

Data from: microformats, RDF, XSLT, Yahoo's index, and soon, top tags from delicious.

If no machine data, can write an XSLT. 'isn't that hard'. Lots of documentation on the web.

Examples of things that have been made – a tool that exposes all the metadata known for a page. URL on slide. can install on Yahoo search page, add it in. Use location data to make a map – any page on web with metadata about locations on it – map monkey. Get qype results for anything you search for.

There's a mailing list (people willing and wanting to answer questions) and a tutorial.

Questions

Question: do you need to use a special doctype [for RDFa]?
Answer: added to spec that 'you should use this doctype' but the spec allows for RDFa to be used in situations when can't change doctype e.g. RDFa embedded in blogger blogpost. Most parsers walk the DOM rather than relying on the doctype.

Jim O'D – excited that SearchMonkey supports XSLT – if have website with correctly marked up tables, could expose those as key/value pairs?
Answer: yes. XSLT fantastic tool for when don't have data marked up – can still get to it.

Frankie – question I couldn't hear. About info out to users?
Answer: if you've built a monkey, up to you to tell people about it for the moment. Some monkeys are auto-on e.g. Facebook, wikipedia… possibly in future, if developed a monkey for a site you own, might be able to turn it auto-on in the results for all users… not sure yet if they'll do it or not.
Frankie: plan that people get monkeys they want, or go through gallery?
Answer: would be fantastic if could work out what people are using them for and suggest ones appropriate to people doing particular kinds of searches, rather than having to go to a gallery.

Christian Heilmann on Yahoo!'s YQL, open data tables, APIs

My notes from Christian Heilmann's talk on 'Reaching those web folk' with Yahoo!'s new-ish YQL, open data tables and APIs at the National Maritime Museum [his slides]. My notes are a bit random, but might be useful for people, especially the idea of using YQL as an easy way to prototype APIs (or implement APIs without too much work on your part).

For him it's about data on the web, not just technology.

Number of users is a crap metric, [should consider the user experience].

Stats should be what you use to discover areas where are the problems, not to pat yourself on the back.

People with blackberries have no Javascript, no CSS. Don't have front-loading navigation they have to scroll through – cos they won't.

If you think of your site as content, then visitors can become 'broadcasting stations' and relay your message. Information flows between readers and content. They're passing it on through distribution channels you're not even aware of.

Content on the web is validated with links and quotes from other sources e.g. Wikipedia. People mix your information with other sources to prove a point or validate it. eg. photos on maps.

How can you be part of it?
Make it easy to access. Structure your websites in (plain old semantic HTML) a semantic manner. Title is important, etc. Add more semantic richness with RDF and microformats. Provide data feeds or RSS. Consider the Rolls Royce of distribution – an API. Help other machines make sense of your content – search engines will love you too.

Yahoo index via BOSS API – Yahoo do it because they know 'search engines are dying'. Catch-all search engines are stupid. Apples are not the same apples for everyone. Build a cleverer web search.

http://ask-boss.appspot.com/ – nlp analysis of search results. Try 'who is batman in the dark knight' – amazing.

BOSS provides mainstream channel for semantic web and microformats. Microformats are chicken and egg problem. Using searchmonkey technology, BOSS lists this information in the results. BOSS can return all known information about a page, structured.

Key terms parameter in BOSS – what did people enter to find a site/page? http://keywordfinder.org/ – what successful websites have for a given keyword.

Clean HTML is the most important thing, semantic and microformats are good.

If your data is interesting enough, people will try to get to it and remix it.

[Curl has grown up since I last used it! Can be any browser, do cookies, etc.]

Now the web looks like an RSS reader.

Include RSS in your stats.

Guardian – any of their content websites put out RSS through CMS. They then provided an API so end users can filter down to the data they need.

Programmable Web – excellent resource but can be overwhelming.

The more data sources you use, the more time you spend reading API documentation, sos every API is different. Terms, formats, etc. The more sources you connect to, the more chances of error. The more stuff you pull in, the slower the performance of your website.

So you need systems to aggregate sources painlessly. Yahoo Pipes. A visual interface, changes have to be made by hand.

You can't quickly use a pipe in your code and change it on the fly. e.g. change a parameter for one implementation. No version control.

So that's one of the reasons for YQL: Yahoo Query Language. SQL style interface to all yahoo data (all Yahoo APIs) and the web. Yahoo build things with APIs cos it's the only way to scale. Book: 'scalable websites', all about APIs.

Build queries to Yahoo APIs, try them out in YQL console. Provides diagnostics – which URLs, how long it took, any problems encountered. Allows nesting of API calls.

Outputs XML or JSON, consistent format so you know how to use that information.

YQL also helped internally because of varying APIs between departments.

Gives access to all Yahoo services, any data sources on the web, including html and microformats, and can scrape any website.

Open tables
Easy way to add own information to YQL. Tell Yahoo end point where can get the info.

Jim wanted to allow people to access data without building an API. All it needed was a simple XML file.

[Though you do need RSS results from a search engine to point to – I'm going to see what we can output from our Google Mini and will share any code – or would appreciate some time-saving pointers if anyone has any. Yes, hello, lazyweb, that's my coat, thanks.]

Basically it's a way of providing an API without having to develop one.

Concluding: you can piggyback on people's social connections with other people by making data shareable. [Then your data is shared, yay. Assuming your institution is down with that, and no copyrights or puppies were hurt in the process.]

APIs are a commitment – have to be available all the time, lot of traffic, but hard to measure traffic and benefits. Making APIs scale is a pain and have to be clever to do it. Pointing YQL open data table pointing to search engine on your site also works.

Saves documenting API? [??]

YQL handles the interface, caching and data conversion for you. Also limits the access to sensible levels – 10,000 hits/hour.

Jim – 'images from collection' displayed on page as badge thing with YQL as RSS browser. Can just create RSS feed for exhibition than can new badge for new exhibition.

Using YQL protects against injection attacks.

Comment from audience – YQL as meta-API.

Registering is basically making the XML file. You need a Yahoo ID to use the console. [The console is cool, basically like a SQL 'enterprise' system console, with errors and transaction processing costs.]

We had questions about adding in metrics, stats, to use both for reporting and keeping funders/bosses happy and for diagnostics – to e.g. find out which areas of the collection are being queried, what people are finding interesting.

github repository as place to register open tables to make them discoverable.

There's a YQL blog.

[So, that's it – it's probably worth a play, and while your organisation might not want to use it in production without checking out how long the service is likely to be around, etc, it seems like an easy way of playing with API-able data. It'd be really interesting to see what happened if a few museums with some overlap in their collections coverage all made their data available as an open table.]

Notes from the closing plenary, MW2009

These are my quick and dirty notes from the closing plenary of the 2009 Museums and the Web conference . If I've quoted you but gotten your name wrong, I'm very sorry – please let me know and I'll correct it. I haven't put links in for anyone yet so I'll be editing the entry anyway.

'We are the program.' Awards for blog posts, tweets, Flickr photos then David Bearman invited people to come up and talk about what they've learnt, what they'll take away.

Nina, Museum 2.0 – inspired by Max's keynote address. But she didn't feel that difference in the institution. Didn't see the transparency and openness that you get on the web, on their dashboard. Not saying they have to do that, but wants to bring up idea of participatory ghetto… forming relationships with visitors on the web, who'll show up at museums and wonder why the same relationship isn't reflected in the building. Pushing in institutions to establish parity, not to give up on physical space also being somewhere for openness and transparency. IMA – had experience of extreme cognitive dissonance. How can you start the conversation, taking great stuff from web world into physical environment of institutions. Her first time at MW.

Heather from Balbao – new to conference and museum world, great introduction.

Nate, Walker Art Centre – I always leave inspired, seen it happen every time- a month worth of trying new things, then it trickles off and fades… go to the wiki and take the post-conference challenge to do one thing in April – choose one task that you can achieve by the end of April. Distributed agile development … beyond API, everyone can benefit from going home and immediately doing just one thing. [eek I feel weird taking notes about my ideas]

Frankie, Rattle – be excited about tin mining.

Brian, UKOLN – danger that losing accessibility cos doing innovative things, but there have been some really great examples. Universally accessible – pushing it (the definition) of it forward.

Seb, Powerhouse – need to bring people in, curators, management.

Julie (?) – boundaries between web and physical boundaries – problematising the name of the conference. Is 'web' starting to constrain what we're about?

Nina – comment on that – conference in US called WebWise – lousy content but less funded projects, mostly director level people who go. How do we get these people in a situation that's more blended with the kind of people who are here?

Victoria, Smithsonian? carrying on Nina and Seb's point – spends first month being excited, but directors etc aren't going to come to conferences like this. You may have five minutes to articulate why something is important – and it's not heard when it's someone outside, even if you've been saying it on the inside for years. Having someone who's succeeded from outside, doing snippets of video or whatever – convincing.

David – seeing what can share back. Spend time at conference demanding people write papers, share slides… would really love for the post-conference discussion that takes place online to incorporate thoughts, experience about what doing. Extension into social space of a discourse we've never really had – how do you use that post-conference excitement… how do organisations change, which is becoming the centre of the discourse… take it further, keep talking to each other about how do you make it work.

Jennifer – the thing we can do by the end of April, if you write a report, share it with your colleagues. Let people pinch your ideas, send it out. Share the reports as well as the stuff that happens when we're right here.

Jon Pratty – we need a more social media within the museum.

Peter Samis – can remember this camaraderie in 1991… hearing it just as fresh now with people who are coming to their first conference, loving it… this is going to have legs, it's going to keep running, continue this spirit throughout the year.

Rich (another Rich) – haven't really felt the amount of community before, but have been coming since 1999. Being able to catch up on the things he missed while he was here.

Brian – people in the community can fall out, it's happened in the UK. People have strongly held views, need to depersonalise disputes, constructive criticism.

Scott (?) – we're not the only people talking about these subjects, it's happening in higher education, the commercial sector, not a whole of discussion here about what's happening out there and what impact it has here. Would be neat to do some headlines on what's going on in the world outside museum, add to the implications for this audience.
[This final session probably contributed quite a bit to my summary of MW2009 – I'd written the 'MW2009 challenge' a little while before (after discussions at the ice cream API meet) and it was wonderful to feel so much excitement (tempered with realistic cynicism) in the room about the positive changes we could make when we went back to our home institutions.]

Get thee to a wiki – the great API challenge in action

Help us work on an informal, lightweight way of devising shared data, API standards for museum and cultural heritage organisations – museum-api.pbwiki.com is open for business.

You could provide examples of APIs you've used or produced, share your experience as a consumer of web services, tell us about your collections.

Commenting on other people's queries and content is an easy way to get started. I'd particularly love to hear from curators and collections managers – we should be working together to enable greater access to collections. If you check it out and none of it makes any sense – be brave and say so! We should be able to explain what we're doing clearly, or we're not doing it right.

Some background: as announced on the nascent museumdev blog, the Science Museum is looking at releasing an API soon – it'll be project-specific to start with, but we're creating it with the intention of using that as an iterative testing and learning process to design an API for wider use. We could re-invent the wheel, but we'd rather make it easy for people to use what they've learnt using other APIs and other museum collections – the easiest way to do that is to work with other museums and developers. The Science Museum's initial public-facing collections API will be used for a 'mashup competition' based on object metadata from our 'cosmos and culture' gallery.

Speaking of museumdev, I started it as somewhere where I could ask questions, point people to discussions, a home for collections of links and stuff in development. It's also got random technical bits like 'Tip of the Day: saving web.config as Unicode' because I figure I might as well share my mistakes^H^H^H^H^H^H^H^H learning experiences in the hope that someone, somewhere, benefits.