cultural content – Page 2

Notes from Europeana's Open Culture Conference 2010

The Open Culture 2010 conference was held in Amsterdam on October 14 – 15. These are my notes from the first day (I couldn't stay for the second day). As always, they're a bit rough, and any mistakes are mine. I haven't had a chance to look for the speakers' slides yet so inevitably some bits are missing. If you're in a hurry, the quote of the day was from Ian Davis: "the goal is not to build a web of data. The goal is to enrich lives through access to information".

The morning was MCd by Costis Dallas and there was a welcome and introduction from the chair of the Europeana Foundation before Jill Cousins (Europeana Foundation) provided an overview of Europeana. I'm sure the figures will be available online, but in summary, they've made good progress in getting from a prototype in 2008 to an operational service in 2010. [Though I have written down that they had 1 million visits in 2010, which is a lot less than a lot of the national museums in the UK though obviously they've had longer to establish a brand and a large percentage of their stats are probably in the 'visit us' areas rather than collections areas.]

Europeana is a super-aggregator, but doesn't show the role of the national or thematic aggregators or portals as providers/collections of content. They're looking to get away from a one-way model to the point where they can get data back out into different places (via APIs etc). They want to move away from being a single destination site to putting information where the user is, to continue their work on advocacy, open source code etc.

Jill discussed various trends, including the idea of an increased understanding that access to culture is the foundation for a creative economy. She mentioned a Kenneth Gilbraith [?] quote on spending more on culture in recession as that's where creative solutions come from [does anyone know the reference?]. Also, in a time of Increasing nationationalism, Europeana provided an example to combat it with example of trans-Euro cooperation and culture. Finally, customer needs are changing as visitors move from passive recipients to active participants in online culture.

Europeana [or the talk?] will follow four paths – aggregration, distribution, facilitation, engagement.

Aggregation – build the trusted source for European digital cultural material. Source curated content, linked data, data enrichment, multilinguality, persistent identifiers. 13 million objects but 18-20thC dominance; only 2% of material is audio-visual [?]. Looking towards publishing metadata as linked open data, to make Europeana and cultural heritage work on the web, e.g. of tagging content with controlled vocabularies – Vikings as tagged by Irish and Norwegian people – from 'pillagers' to 'loving fathers'. They can map between these vocabularies with linked data.
Distribution – make the material available to the user wherever they are, whenever they want it. Portals, APIs, widgets, partnerships, getting information into existing school systems.
Facilitate innovation in cultural heritage. Knowledge sharing (linked data), IPR business models, policy – advocacy and public domain, data provider agreements. If you write code based on their open sourced applications, they'd love you to commit any code back into Europeana. Also, look at Europeana labs.
Engagement – create dialogue and participation. [These slides went quickly, I couldn't keep up]. Examples of the Great War Archive into Europe [?]. Showing the European connection – Art Nouveau works across Europe.

The next talk was Liam Wyatt on 'Peace love and metadata', based in part on his experience at the British Museum, where he volunteered for a month to coordinate the relationship between Wikipedia as representative of the open web [might have mistyped that, it seems quite a mantle to claim] and the BM as representatiave of [missed it]. The goal was to build a proactive relationship of mutual benefit without requiring change in policies or practices of either. [A nice bit of realism because IMO both sides of the museum/Wikipedia relationship are resistant to change and attached firmly to parts of their current models that are in conflict with the other conglomeration.]

The project resulted in 100 new Wikipedia articles, mostly based on the BM/BBC A History of the World in 100 Objects project (AHOW). [Would love to know how many articles were improved as a result too]. They also ran a 'backstage pass' day where Wikipedians come on site, meet with curators, backstage tour, then they sit down and create/update entries. There were also one-on-one collaborators – hooking up Wikipedians and curators/museums with e.g. photos of objects requested.

It's all about improving content, focussing on personal relationshiips, leveraging the communities; it didn't focus on residents (his own work), none of them are content donation projects, every institution has different needs but can do some version of this.

[I'm curious about why it's about bringing Wikipedians into museums and not turning museum people into Wikipedians but I guess that's a whole different project and may be result from the personal relationships anyway.]

Unknown risks are accounted for and overestimated. Unknown rewards are not accounted for and underestimated. [Quoted for truth, and I think this struck a chord with the audience.]

Reasons he's heard for restricting digital access… Most common 'preserving the integrity of the collection' but sounds like need to approve content so can approve of usages. As a result he's seen convoluted copyright claims – it's easy tool to use to retain control.

Derivative works. Commercial use. Different types of free – freedom to use, freedom to study and apply knowledge gained; freedom to make and redistribute copies; [something else].

There are only three applicable licences for Wikipedia. Wikipedia is a non-commercial organisation, but don't accept any non-commercially licenced content as 'it would restrict the freedom of people downstream to re-use the content in innovative ways'. [but this rules out much museum content, whether rightly or not, and with varying sources from legal requirements to preference. Licence wars (see the open source movement) are boring, but the public would have access to more museum content on Wikipedia if that restriction was negotiable. Whether that would outweight the possible 'downstream' benefit is an interesting question.]

Liam asked the audience, do you have a volunteer project in your institution? do you have an e-volunteer program? Well, you do already, you just don't know it. It's a matter of whether you want to engage with them back. You don't have to, and it might be messy.

Wikipedia is not a social network. It is a social construction – it requires a community to exist but socialising is not the goal. Wikipedia is not user generated content. Wikipedia is community curated works. Curated, not only generated. Things can be edited or deleted as well as added [which is always a difficulty for museums thinking about relying on Wikipedia content in the long term, especially as the 'significance' of various objects can be a contested issue.]

Happy datasets are all alike; every unhappy dataset is unhappy in its own way. A good test of data is that it works well with others – technically or legally.

According to Liam, Europeana is the 21st century of the gallery painting – it's a thumbnail gallery but it could be so much more if the content was technically and legally able to be re-used, integrated.
Data already has enough restrictions already e.g. copyright, donor restrictions. but if it comes without restrictions, its a shame to add them. 'Leave the gate as you found it'.

'We're doing the same thing for the same reason for the same people in the same medium, let's do it together.'

The next sessions were 'tasters' of the three thematic tracks of the second part of the day – linked data, user-generated content, and risks and rewards. This was a great idea because I felt like I wasn't totally missing out on the other sessions.

Ian Davis from Talis talked about 'linked open culture' as a preview of the linked data track. How to take practices learned from linked data and apply them to open culture sector. We're always looking for ways to exchange info, communicate more effecively. We're no longer limited by the physicality of information. 'The semantic web fundamentally changes how information, machines and people are connected together'. The semantic web and its powerful network effects are enabling a radical transformation away from islands of data. One question is, does preservation require protection, isolation, or to copy it as widely as possible?

Conjecture 1 – data outlasts code. MARC stays forever, code changes. This implies that open data is more important than open source.
Conjecture 2 – structured data is more valuable than unstructured. Therefore we should seek to structure our data well.
Conjecture 3 – most of the value in our data will be unexpected and unintended. Therefore we should engineer for serendipity.

'Provide and enable' – UK National Archives phrase. Provide things you're good at – use unique expertise and knowledge [missed bits]… enable as many people as possible to use it – licence data for re-use, give important things identifiers, link widely.

'The goal is not to build a web of data. The goal is to enrich lives through access to information.'
[I think this is my new motto – it sums it up so perfectly. Yes, we carry on about the technology, but only so we can get it built – it's the means to an end, not the end itself. It's not about applying acronyms to content, it's about making content more meaningful, retaining its connection to its source and original context, making the terms of use clear and accessible, making it easy to re-use, encouraging people to make applications and websites with it, blah blah blah – but it's all so that more people can have more meaningful relationships with their contemporary and historical worlds.]

Kevin Sumption from the National Maritime Museum presented on the user-generated content track. A look ahead – the cultural sector and new models… User-generated content (UGC) is a broad description for content created by end users rather than traditional publishers. Museums have been active in photo-sharing, social tagging, wikipedia editing.

Crowdsourcing e.g. – reCAPTCHA [digitising books, one registration form at a time]. His team was inspired by the approach, created a project called 'Old Weather' – people review logs of WWI British ships to transcribe the content, especially meterological data. This fills in a gap in the meterological dataset for 1914 – 1918, allows weather in the period to be modelled, contributes to understanding of global weather patterns.

Also working with Oxford Uni, Rutherford Institute, Zooniverse – solar stormwatch – solar weather forecast. The museum is working with research institutions to provide data to solve real-world problems. [Museums can bring audiences to these projects, re-ignite interest in science, you can sit at home or on the train and make real contributions to on-going research – how cool is that?]

Community collecting. e.g. mass observation project 1937 – relaunched now and you can train to become an observer. You get a brief e.g. families on holidays.

BBC WW2 People's War – archive of WWII memories. [check it out]

RunCoCO – tools for people to set up community-lead, generated projects.

Community-lead research – a bit more contentious – e.g. Guardian and MPs expenses. Putting data in hands of public, trusting them to generate content. [Though if you're just getting people to help filter up interesting content for review by trusted sources, it's not that risky].

The final thematic track preview was by Charles Oppenheim from Loughborough University, on the risks and rewards of placing metadata and content on the web. Legal context – authorisation of copyright holder is required for [various acts including putting it on the web] unless… it's out of copyright, have explicit permission from rights holder (not implied licence just cos it's online), permission has been granted under licensing scheme, work has been created by a member of staff or under contract with IP assigned.

Issues with cultural objects – media rich content – multiple layers of rights, multiple rights holders, multiple permissions often required. Who owns what rights? Different media industries have different traditions about giving permission. Orphan works.

Possible non-legal ramifiations of IPR infringements – loss of trust with rights holders/creators; loss of trust with public; damage to reputation/bad press; breach of contract (funding bodies or licensors); additional fees/costs; takedown of content or entire service.

Help is at hand – Strategic Content Alliance toolkit [online].

Risks beyond IPR – defamation; liability for provision of inaccurate information; illegal materials e.g. pornography, pro-terrorism, violent materials, racist materials, Holocaust denial; data protection/privacy breaches; accidental disclosure of confidential information.

High risk – anything you make money from; copying anything that is in copyright and is commercially availabe.
Low risk – orphan works of low commercial value – letters, diaries, amateur photographs, films, recordings known by less known people.
Zero risk stuff.
Risks on the other side of the coin [aka excuses for not putting stuff up]

'Museums meet the 21st century' – OpenTech 2010 talk

These are my notes for the talk I gave at OpenTech 2010 on the subject of 'Museums meet the 21st Century'. Some of it was based on the paper I wrote for Museums and the Web 2010 about the 'Cosmic Collections' mashup competition, but it also gave me a chance to reflect on bigger questions: so we've got some APIs and we're working on structured, open data – now what? Writing the talk helped me crystallise two thoughts that had been floating around my mind. One, that while "the coolest thing to do with your data will be thought of by someone else", that doesn't mean they'll know how to build it – developers are a vital link between museum APIs, linked data, etc and the general public; two, that we really need either aggregated datasets or data using shared standards to get the network effect that will enable the benefits of machine-readable museum data. The network effect would also make it easier to bridge gaps in collections, reuniting objects held in different institutions. I've copied my text below, slides are embedded at the bottom if you'd rather just look at the pictures. I had some brilliant questions from the audience and afterwards, I hope I was able to do them justice. OpenTech itself was a brilliant day full of friendly, inspiring people – if you can possibly go next year then do!

Museums meet the 21st century.
Open Tech, London, September 11, 2010

Hi, I'm Mia, I work for the Science Museum, but I'm mostly here in a personal capacity…

Alternative titles for this talk included: '18th century institution WLTM 21st century for mutual benefit, good times'; 'the Age of Enlightenment meets the Age of Participation'. The common theme behind them is that museums are old, slow-moving institutions with their roots in a different era.

Why am I here?

The proposal I submitted for this was 'Museums collaborating with the public – new opportunities for engagement?', which was something of a straw man, because I really want the answer to be 'yes, new opportunities for engagement'. But I didn't just mean any 'public', I meant specifically a public made up of people like you. I want to help museums open up data so more people can access it in more forms, but most people can't just have a bit of a tinker and create a mashup. “The coolest thing to do with your data will be thought of by someone else” – but that doesn’t mean they’ll know how to build it. Audiences out there need people like you to make websites and mobile apps and other ways for them to access museum content – developers are a vital link in the connection between museum data and the general public.

So there's that kind of help – helping the general public get into our data; and there's another kind of help – helping museums get their data out. For the first, I think I mostly just want you to know that there's data out there, and that we'd love you to do stuff with it.

The second is a request for help working on things that matter. Linkable, open data seems like a no-brainer, but museums need some help getting there.

Museums struggle with the why, with the how, and increasingly with the "we are reducing our opening hours, you have to be kidding me".

Chicken and the egg

Which comes first – museums get together and release interesting data in a usable form under a useful licence and developers use it to make cool things, or developers knock on the doors of museums saying 'we want to make cool things with your data' and museums get it sorted?

At the moment it's a bit of both, but the efforts of people in museums aren't always aligned with the requests from developers, and developers' requests don't always get sent to someone who'll know what to do with it.

So I'm here to talk about some stuff that's going on already and ask for a reality check – is this an idea worth pursuing? And if it is, then what next?
If there’s no demand for it, it won’t happen. Nick Poole, Chief Executive, Collections Trust, said on the Museums Computer Group email discussion list: "most museum people I speak to tend not to prioritise aggregation and open interoperability because there is not yet a clear use case for it, nor are there enough aggregators with enough critical mass to justify it.”

But first, an example…

An experiment – Cosmic Collections, the first museum mashup competition

The Cosmic Collections project was based on a simple idea – what if a museum gave people the ability to make their own collection website for the general public? Way back in December 2008 I discovered that the Science Museum was planning an exhibition on astronomy and culture, to be called ‘Cosmos & Culture’. They had limited time and resources to produce a site to support the exhibition and risked creating ‘just another exhibition microsite’. I went to the curator, Alison Boyle, with a proposal – what if we provided access to the machine-readable exhibition content that was already being gathered internally, and threw it open to the public to make websites with it? And what if we motivated them to enter by offering competition prizes? Competition participants could win a prize and kudos, and museum audiences might get a much more interesting, innovative site. Astronomy is one of the few areas where the amateur can still make valued scientific contributions, so the idea was a good match for museum mission, exhibition content, technical context, and hopefully developers – but was that enough?

The project gave me a chance to investigate some specific questions. At the time, there were lots of calls from some quarters for museums to produce APIs for each project, but there was also doubt about whether anyone would actually use a museum API, whether we could justify an investment in APIs and machine-readable data. And can you really crowdsource the creation of collections interfaces? The Cosmic Collections competition was a way of finding out.

Lessons? An API isn't a magic bullet, you still need to support the dev community, and encourage non-technical people to find ways to play with it. But the project was definitely worth doing, even if just for the fact that it was done and the world didn't end. Plus, the results were good, and it reinforced the value of working with geeks. [It also got positive coverage in the technical press. Who wouldn’t be happy to hear ‘the museum itself has become an example of technological innovation’ or that it was ‘bringing museums out into the open as places of innovation’?]

Back to the chicken and the egg – linking museums

So, back to the chicken and the egg… Progress is being made, but it gets bogged down in discussions about how exactly to get data online. Museums have enough trouble getting the suppliers they work with to produce code that meets accessibility standards, let alone beautifully structured, re-usable open data.

One of the reasons open, structured data is so attractive to museum technologists is that we know we can never build interfaces to meet the needs of every type of audience. Machine-readable data should allow people with particular needs to create something that supports their own requirements or combines their data with ours to make lovely new things.

Explore with us – tell museums what you need

So if you're someone who wants to build something, I want to hear from you about what standards you're already working with, which formats work best for you…

To an extent that's just moving the problem further down the line, because I've discovered that when you ask people what data standards they want to use, and they tell you it turns out they're all different… but at least progress is being made.

Dragons we have faced

I think museums are getting to the point where they can live with the 80% in the interest of actually getting stuff done.

Museums need to get over the idea that linkable data must be perfect – perfectly clean data, perfectly mapped to perfect vocabularies and perfectly delivered through perfect standards. Museums are used to mapping data from their collections management systems for a known end-use, they've struggled with open-ended requirements for unknown future uses.

The idea that aggregated data must be able to do everything that data provided at source can do has held us back. Aggregated data doesn't need to be able to do everything – sometimes discoverability is enough, as long as you can get back to the source if you need the rest of the data. Sometimes it's enough to be able to link to someone else's record that you've discovered.

Museum data and the network effect

One reason I'm here (despite the fact that public speaking is terrifying) is a vision of the network effect that could apply when we have open museum data.

We could re-unite objects across time and place and people, connecting visitors and objects, regardless of owing institution or what type of object or information it is. We could create highlight collections by mining data across museums, using the links people are making between our collections. We can help people tell their local stories as well as the stories about big subject and world histories. Shared data standards should reduce learning curve for people using our data which would hopefully increase re-use.

Mismatches between museums and tech – reasons to be patient

So that's all very exciting, but since I've also learnt that talking about something creates expectations, here are some reasons to be patient with museums, and tolerant when we fail to get it right the first time…

IT is not a priority for most museums, keeping our objects secure and in one piece is, as is getting some of them on display in ways that make sense to our audiences.

Museums are slow. We'll be talking about stuff for a long time before it happens, because we have limited resources and risk-averse institutions. Museum project management is designed for large infrastructure projects, moving hundreds of delicate objects around while major architectural builds go on. It's difficult to find space for agility and experimentation within that.

Nancy Proctor from the Smithsonian said this week: "[Museum] work is more constrained than a general developer" – it must be of the highest quality; for everybody – public good requires relevance and service for all, and because museums are in the 'forever business' it must be sustainable.

How you can make a difference

Museums are slowly adapting to the participation models of social media. You can help museums create (backend) architectures of participation. Here are some places where you can join in conversations with museum technologists:

Museums Computer Group – events, mailing list http://museumscomputergroup.org.uk/ #ukmcg @ukmcg

Linking Museums – meetups, practical examples, experimenting with machine-readable data http://museum-api.pbworks.com/

Space Time Camp – Nov 4/5, #spacetimecamp

‘Museums and the Web’ conference papers online provide a good overview of current work in the sector http://www.archimuse.com/conferences/mw.html

So that‘s all fun, but to conclude – this is all about getting museums to the point where the technology just works, data flows like water and our energy is focussed on the compelling stories museums can tell with the public. If you want to work on things that matter – museums matter, and they belong to all of us – we should all be able to tell stories with and through museums.

Thank you for listening

Keep in touch at @mia_out or https://openobjects.org.uk/

Museums meet the 21st century

View more presentations from Mia .

Linking museums: machine-readable data in cultural heritage – meetup in London July 7

Somehow I've ended up organising an (very informal) event about 'Linking museums: machine-readable data in cultural heritage' on Wednesday, July 7, at a pub near Liverpool St Station. I have no real idea what to expect, but I'd love some feisty sceptics to show up and challenge people to make all these geeky acronyms work in the real museum world.

As I posted to the MCG list: "A very informal meetup to discuss 'Linking museums: machine-readable data in cultural heritage' is happening next Wednesday. I'm hoping for a good mix of people with different levels of experience and different perspectives on the issue of publishing data that can be re-used outside the institution that created it. … please do pass this on to others who may be interested. If you would like to come but can't get down to that London, please feel free to send me your questions and comments (or beer money)."

The basic details are: July 7, 2010, Shooting Star pub, London. 7:30 – 10pm-ish. More information is available at http://museum-api.pbworks.com/July-2010-meetup and you can let me know you're coming or register your interest.

In more detail…

Why?
I'm trying to cut through the chicken and egg problem – as a museum technologist, I can work towards getting machine-readable data available, but I'm not sure which formats and what data would be most useful for developers who might use it. Without a critical mass of take-up for any one type, the benefits of any one data source are more limited for developers. But museums seem to want a sense of where the critical mass is going to be so they can build for that. How do we cut through this and come up with a sensible roadmap?

Who?
You! If you're interested in using museum data in mashups but find it difficult to get started or find the data available isn't easily usable; if you have data you want to publish; if you work in a museum and have a
data publication problem you'd like help in solving; if you are a cheerleader for your favourite acronym…

Put another way, this event is for you if you're interested in publishing and sharing data about their museums and collections through technologies such as linked data and microformats.

It'll be pretty informal! I'm not sure how much we can get done but it'd be nice to put faces to names, and maybe start some discussions around the various problems that could be solved and tools that could be
created with machine-readable data in cultural heritage.

Some thoughts on linked data and the Science Museum – comments?

I've been meaning to finish this for ages so I could post it, but then I realised it's more use in public in imperfect form than in private, so here goes – my thoughts on linked data, APIs and the Science Museum on the 'Museums and the machine-processable web' wiki. I'm still trying to find time to finish documenting my thoughts, and I've already had several useful comments that mean I'll need to update it, but I'd love to hear your thoughts, comments, etc.

Getting closer to a time machine-x-curator in your pocket

If life is what happens to you while you're busy making other plans thenI'm glad I've been busy planning various things, because it meant that news of the EU-funded iTacitus project was a pleasant surprise. The project looked at augmented reality, itinerary planning and contextual information for cultural tourism.

As described on their site and this gizmowatch article, it's excitingly close to the kind of 'Dopplr for cultural heritage' or 'pocket curatr' I've written about before:

Visitors to historic cities provide the iTacitus system with their personal preferences – a love of opera or an interest in Roman history, for example – and the platform automatically suggests places to visit and informs them of events currently taking place. The smart itinerary application ensures that tourists get the most out of each day, dynamically helping them schedule visits and directing them between sites.
Once at their destination, be it an archaeological site, museum or famous city street, the AR component helps bring the cultural and historic significance to life by downloading suitable AR content from a central server.

There's a video showing some of the AR stuff (superimposed environments, annotated Landscapes) in action on the project site. It didn't appear to have sound so I don't know if it also demonstrated the 'Spatial Acoustic Overlays'.

The NPG's response to the Wikimedia kerfuffle

[Apparently responses are being listed on a Wikimedia page, which I suppose makes sense but please bear in mind this is usually read by about five people who know my flippant self in real life]

I haven't been able to get the press release section of the National Portrait gallery to load, so I'm linking to an email from the NPG posted as a comment on another blog. I'm still thinking this through, but currently the important bit, to me, is this:

The Gallery is very concerned that potential loss of licensing income from the high-resolution files threatens its ability to reinvest in its digitisation programme and so make further images available. It is one of the Gallery’s primary purposes to make as much of the Collection available as possible for the public to view.

Digitisation involves huge costs including research, cataloguing, conservation and highly-skilled photography. Images then need to be made available on the Gallery website as part of a structured and authoritative database.

Obviously, I am paid by a museum to put things online so I might be biased towards something that ultimately means my job exists – but while a government funding gap exists, someone has to pay the magical digitisation fairies. [This doesn't mean I think it's right, but the situation is not going to be changed by an adversarial relationship between WMF and the cultural heritage sector, which is why this whole thing bothers me. Lots of good work explaining the Commons models and encouraging access is being undone.]

You can't even argue that the NPG is getting increased exposure or branding through the use of their images, as there's a big question over whether images hosted on Wikimedia are being incorrectly given new attribution and rights statements. Check the comment about the image on this blog post, and the Wikipedia statement from Wikimedia about the image and the original image page.

To use a pub analogy, is Wikimedia the bad mate who shouts other people a round on your tab?

Final thoughts on open hack day (and an imaginary curatr)

I think hack days are great – sure, 24 hours in one space is an artificial constraint, but the sheer brilliance of the ideas and the ingenuity of the implementations is inspiring. They're a reminder that good projects don't need to take years and involve twenty circles of sign-off, even if that's the reality you face when you get back to the office.

I went because it tied in really well with some work projects (like the museum metadata mashup competition we're running later in the year or the attempt to get a critical mass of vaguely compatible museum data available for re-use) and stuff I'm interested in personally (like modern bluestocking, my project for this summer – let me know if you want to help, or just add inspiring women to freebase).

I'm also interested in creating something like a Dopplr for museums – you tell it what you're interested in, and when you go on a trip it makes you a map and list of stuff you could see while you're in that city.

Like: I like Picasso, Islamic miniatures, city museums, free wine at contemporary art gallery openings, [etc]; am inspired by early feminist history; love hearing about lived moments in local history of the area I'll be staying in; I'm going to Barcelona.

The 'list of cultural heritage stuff I like' could be drawn from stuff you've bookmarked, exhibitions you've attended (or reviewed) or stuff favourited in a meta-museum site.

(I don't know what you'd call this – it's like a personal butlr or concierge who knows both your interests and your destinations – curatr?)

The talks on RDFa (and the earlier talk on YQL at the National Maritime Museum) have inspired me to pick a 'good enough' protocol, implement it, and see if I can bring in links to similar objects in other museum collections. I need to think about the best way to document any mapping I do between taxonomies, ontologies, vocabularies (all the museumy 'ies') and different API functions or schemas, but I figure the museum API wiki is a good place to draft that. It's not going to happen instantly, but it's a good goal for 2009.

These are the last of my notes from the weekend's Open Hack London event, my notes from various talks are tagged openhacklondon.

Tom Morris, SPARQL and semweb stuff – tech talk at Open Hack London

Tom Morris gave a lightning talk on 'How to use Semantic Web data in your hack' (aka SPARQL and semantic web stuff).

He's since posted his links and queries – excellent links to endpoints you can test queries in.

Semantic web often thought of as long-promised magical elixir, he's here to say it can be used now by showing examples of queries that can be run against semantic web services. He'll demonstrate two different online datasets and one database that can be installed on your own machine.

First – dbpedia – scraped lots of wikipedia, put it into a database. dbpedia isn't like your averge database, you can't draw a UML diagram of wikipedia. It's done in RDF and Linked Data. Can be queried in a language that looks like SQL but isn't. SPARQL – is a w3c standard, they're currently working on SPARQL 2.

Go to dbpedia.org/sparql – submit query as post. [Really nice – I have a thing about APIs and platforms needing a really easy way to get you to 'hello world' and this does it pretty well.]

[Line by line comments on the syntax of the queries might be useful, though they're pretty readable as it is.]

'select thingy, wotsit where [the slightly more complicated stuff]'

Can get back results in xml, also HTML, 'spreadsheet', JSON. Ugly but readable. Typed.

[Trying a query challenge set by others could be fun way to get started learning it.]

One problem – fictional places are in Wikipedia e.g. Liberty City in Grand Theft Auto.

Libris – how library websites should be
[I never used to appreciate how much most library websites suck until I started back at uni and had to use one for more than one query every few years]

Has a query interface through SPARQL

Comment from the audience BBC – now have SPARQL endpoint [as of the day before? Go BBC guy!].

Playing with mulgara, open source java triple store. [mulgara looks like a kinda faceted search/browse thing] Has own query language called TQL which can do more intresting things than SPARQL. Why use it? Schemaless data storage. Is to SQL what dynamic typing is to static typing. [did he mean 'is to sparql'?]

Question from audence: how do you discover what you can query against?
Answer: dbpedia website should list the concepts they have in there. Also some documentation of categories you can look at. [Examples and documentation are so damn important for the update of your API/web service.]

Coming soon [?] SPARUL – update language, SPARQL2: new features

The end!

[These are more (very) rough notes from the weekend's Open Hack London event – please let me know of clarifications, questions, links or comments. My other notes from the event are tagged openhacklondon.

Quick plug: if you're a developer interested in using cultural heritage (museums, libraries, archives, galleries, archaeology, history, science, whatever) data – a bunch of cultural heritage geeks would like to know what's useful for you (more background here). You can comment on the #chAPI wiki, or tweet @miaridge (or @mia_out). Or if you work for a company that works with cultural heritage organisations, you can help us work better with you for better results for our users.]

There were other lightning talks on Pachube (pronounced 'patchbay', about trying to build the internet of things, making an API for gadgets because e.g. connecting hardware to the web is hard for small makers) and Homera (an open source 3d game engine).

RDFa, SearchMonkey – tech talks at Open Hack London

While today's Open Hack London event is mostly about the 24-hour hackathon, I signed up just for the Tech Talks because I couldn't afford to miss a whole weekend's study in the fortnight before my exams (stupid exams). I went to the sessions on 'Guardian Data Store and APIs', 'RDFa SearchMonkey', Arduino, 'Hacking with PHP', 'BBC Backstage', Dopplr's 'mashups made of messages' and lightning talks including 'SPARQL and semantic web' stuff you can do now.

I'm putting my rough and ready notes online so that those who couldn't make it can still get some of the benefits. Apologies for any mishearings or mistakes in transcription – leave me a comment with any questions or clarifications.

One of the reasons I was going was to push my thinking about the best ways to provide API-like access to museum information and collections, so my notes will reflect that but I try to generalise where I can. And if you have thoughts on what you'd like cultural heritage institutions to do for developers, let us know! (For background, here's a lightning talk I did at another hack event on happy museums + happy developers = happy punters).

RDFa – now everyone can have an API.
Mark Birkbeck

Going to cover some basic mark-up, and talk about why RDFa is a good thing. [The slides would be useful for the syntax examples, I'll update if they go online.]

RDFa is a new syntax from W3C – a way of embedding metadata (RDF) in HTML documents using attributes.

e.g. <span property="dc:title"> – value of property is the text inside the span.

Because it's inline you don't need to point to another document to provide source of metadata and presentation HTML.

One big advance is that can provide metadata for other items e.g. images, so you can e.g. attach licence info to the image rather than page it's in – e.g. <img src="" rel="licence" resource="[creative commons licence]">

Putting RDFa into web pages means you've now got a feed (the web page is the RSS feed), and a simple static web page can become an API that can be consumed in the same way as stuff from a big expensive system. 'Growing adoption'.

Government department Central Office of Information [?] is quite big on RDFa, have a number of projects with it. [I'd come across the UK Civil Service Job Service API while looking for examples for work presentations on APIs.]

RDFa allows for flexible publishing options. If you're already publishing HTML, you can add RDFa mark-up then get flexible publishing models – different departments can keep publishing data in their own way, a central website can go and request from each of them and create its own database of e.g. jobs. Decentralised way of approaching data distribution.

Can be consumed by: smarter browsers; client-side AJAX, other servers such as SearchMonkey.

He's interested where browsers can do something with it – either enhanced browsers that could e.g. store contact info in a page into your address book; or develop JavaScript libraries that can parse page and do something with it. [screen shot of jobs data in search monkey with enhanced search results]

RDFa might be going into Drupal core.

Example of putting isbn in RDFa in page, then a parser can go through the page, pull out the triples [some explanation of them as mini db?], pull back more info about the book from other APIs e.g. Amazon – full title, thumbnail of cover. e.g. pipes.

Example of FOAF – twitter account marked up in page, can pull in tweets. Could presumably pull in newer services as more things were added, without having to re-mark-up all the pages.

Example of chemist writing a blog who mentions a chemical compound in blog post, a processor can go off and retrieve more info – e.g. add icon for mouseover info – image of molecule, or link to more info.

Next plan is to link with BOSS. Can get back RDFa from search results – augment search results with RDFa from the original page.

Search Monkey (what it is and what you can do with it)
Neil Crosby (European frontend architect for search at Yahoo).

SearchMonkey is (one of) Yahoo's open search platforms (along with BOSS). Uses structured data to enhance search results. You get to change stuff on Yahoo search results page.

SearchMonkey lets you: style results for certain URL patterns; brand those results; make the results more useful for users.

[examples of sites that have done it to see how their results look in Yahoo? I thought he mentioned IMDb but it doesn't look any different – a film search that returns a wikipedia result, OTOH, does.]

Make life better for users – not just what Yahoo thinks results should be, you can say 'actually this is the important info on the page'

Three ways to do it [to change the SERP [search engine results page]: mark up data in a way that Yahoo knows about – 'just structure your data nicely'. e.g. video mark-up; enhance a result directly; make an infobar.

Infobar – doesn't change result see immediately on the page, but it opens on the page. e.g. of auto-enhanced result- playcrafter. Link to developer start page – how to mark it up, with examples, and what it all means.

User-enhanced result – Facebook profile pages are marked up with microformats – can add as friend, poke, send message, view friends, etc from the search results page. Can change the title and abstract, add image, favicon, quicklinks, key/value pairs. Create at [link I can't see but is on slides] Displayed in screen, you fill it out on a template.

Infobar – dropdown in grey bar under results. Can do a lot more, as it's hidden in the infobar and doesn't have to worry people.

Data from: microformats, RDF, XSLT, Yahoo's index, and soon, top tags from delicious.

If no machine data, can write an XSLT. 'isn't that hard'. Lots of documentation on the web.

Examples of things that have been made – a tool that exposes all the metadata known for a page. URL on slide. can install on Yahoo search page, add it in. Use location data to make a map – any page on web with metadata about locations on it – map monkey. Get qype results for anything you search for.

There's a mailing list (people willing and wanting to answer questions) and a tutorial.

Questions

Question: do you need to use a special doctype [for RDFa]?
Answer: added to spec that 'you should use this doctype' but the spec allows for RDFa to be used in situations when can't change doctype e.g. RDFa embedded in blogger blogpost. Most parsers walk the DOM rather than relying on the doctype.

Jim O'D – excited that SearchMonkey supports XSLT – if have website with correctly marked up tables, could expose those as key/value pairs?
Answer: yes. XSLT fantastic tool for when don't have data marked up – can still get to it.

Frankie – question I couldn't hear. About info out to users?
Answer: if you've built a monkey, up to you to tell people about it for the moment. Some monkeys are auto-on e.g. Facebook, wikipedia… possibly in future, if developed a monkey for a site you own, might be able to turn it auto-on in the results for all users… not sure yet if they'll do it or not.
Frankie: plan that people get monkeys they want, or go through gallery?
Answer: would be fantastic if could work out what people are using them for and suggest ones appropriate to people doing particular kinds of searches, rather than having to go to a gallery.

Christian Heilmann on Yahoo!'s YQL, open data tables, APIs

My notes from Christian Heilmann's talk on 'Reaching those web folk' with Yahoo!'s new-ish YQL, open data tables and APIs at the National Maritime Museum [his slides]. My notes are a bit random, but might be useful for people, especially the idea of using YQL as an easy way to prototype APIs (or implement APIs without too much work on your part).

For him it's about data on the web, not just technology.

Number of users is a crap metric, [should consider the user experience].

Stats should be what you use to discover areas where are the problems, not to pat yourself on the back.

People with blackberries have no Javascript, no CSS. Don't have front-loading navigation they have to scroll through – cos they won't.

If you think of your site as content, then visitors can become 'broadcasting stations' and relay your message. Information flows between readers and content. They're passing it on through distribution channels you're not even aware of.

Content on the web is validated with links and quotes from other sources e.g. Wikipedia. People mix your information with other sources to prove a point or validate it. eg. photos on maps.

How can you be part of it?
Make it easy to access. Structure your websites in (plain old semantic HTML) a semantic manner. Title is important, etc. Add more semantic richness with RDF and microformats. Provide data feeds or RSS. Consider the Rolls Royce of distribution – an API. Help other machines make sense of your content – search engines will love you too.

Yahoo index via BOSS API – Yahoo do it because they know 'search engines are dying'. Catch-all search engines are stupid. Apples are not the same apples for everyone. Build a cleverer web search.

http://ask-boss.appspot.com/ – nlp analysis of search results. Try 'who is batman in the dark knight' – amazing.

BOSS provides mainstream channel for semantic web and microformats. Microformats are chicken and egg problem. Using searchmonkey technology, BOSS lists this information in the results. BOSS can return all known information about a page, structured.

Key terms parameter in BOSS – what did people enter to find a site/page? http://keywordfinder.org/ – what successful websites have for a given keyword.

Clean HTML is the most important thing, semantic and microformats are good.

If your data is interesting enough, people will try to get to it and remix it.

[Curl has grown up since I last used it! Can be any browser, do cookies, etc.]

Now the web looks like an RSS reader.

Include RSS in your stats.

Guardian – any of their content websites put out RSS through CMS. They then provided an API so end users can filter down to the data they need.

Programmable Web – excellent resource but can be overwhelming.

The more data sources you use, the more time you spend reading API documentation, sos every API is different. Terms, formats, etc. The more sources you connect to, the more chances of error. The more stuff you pull in, the slower the performance of your website.

So you need systems to aggregate sources painlessly. Yahoo Pipes. A visual interface, changes have to be made by hand.

You can't quickly use a pipe in your code and change it on the fly. e.g. change a parameter for one implementation. No version control.

So that's one of the reasons for YQL: Yahoo Query Language. SQL style interface to all yahoo data (all Yahoo APIs) and the web. Yahoo build things with APIs cos it's the only way to scale. Book: 'scalable websites', all about APIs.

Build queries to Yahoo APIs, try them out in YQL console. Provides diagnostics – which URLs, how long it took, any problems encountered. Allows nesting of API calls.

Outputs XML or JSON, consistent format so you know how to use that information.

YQL also helped internally because of varying APIs between departments.

Gives access to all Yahoo services, any data sources on the web, including html and microformats, and can scrape any website.

Open tables
Easy way to add own information to YQL. Tell Yahoo end point where can get the info.

Jim wanted to allow people to access data without building an API. All it needed was a simple XML file.

[Though you do need RSS results from a search engine to point to – I'm going to see what we can output from our Google Mini and will share any code – or would appreciate some time-saving pointers if anyone has any. Yes, hello, lazyweb, that's my coat, thanks.]

Basically it's a way of providing an API without having to develop one.

Concluding: you can piggyback on people's social connections with other people by making data shareable. [Then your data is shared, yay. Assuming your institution is down with that, and no copyrights or puppies were hurt in the process.]

APIs are a commitment – have to be available all the time, lot of traffic, but hard to measure traffic and benefits. Making APIs scale is a pain and have to be clever to do it. Pointing YQL open data table pointing to search engine on your site also works.

Saves documenting API? [??]

YQL handles the interface, caching and data conversion for you. Also limits the access to sensible levels – 10,000 hits/hour.

Jim – 'images from collection' displayed on page as badge thing with YQL as RSS browser. Can just create RSS feed for exhibition than can new badge for new exhibition.

Using YQL protects against injection attacks.

Comment from audience – YQL as meta-API.

Registering is basically making the XML file. You need a Yahoo ID to use the console. [The console is cool, basically like a SQL 'enterprise' system console, with errors and transaction processing costs.]

We had questions about adding in metrics, stats, to use both for reporting and keeping funders/bosses happy and for diagnostics – to e.g. find out which areas of the collection are being queried, what people are finding interesting.

github repository as place to register open tables to make them discoverable.

There's a YQL blog.

[So, that's it – it's probably worth a play, and while your organisation might not want to use it in production without checking out how long the service is likely to be around, etc, it seems like an easy way of playing with API-able data. It'd be really interesting to see what happened if a few museums with some overlap in their collections coverage all made their data available as an open table.]

Museums meet the 21st century. Open Tech, London, September 11, 2010