Tom Morris, SPARQL and semweb stuff – tech talk at Open Hack London

Tom Morris gave a lightning talk on 'How to use Semantic Web data in your hack' (aka SPARQL and semantic web stuff).

He's since posted his links and queries – excellent links to endpoints you can test queries in.

Semantic web often thought of as long-promised magical elixir, he's here to say it can be used now by showing examples of queries that can be run against semantic web services. He'll demonstrate two different online datasets and one database that can be installed on your own machine.

First – dbpedia – scraped lots of wikipedia, put it into a database. dbpedia isn't like your averge database, you can't draw a UML diagram of wikipedia. It's done in RDF and Linked Data. Can be queried in a language that looks like SQL but isn't. SPARQL – is a w3c standard, they're currently working on SPARQL 2.

Go to dbpedia.org/sparql – submit query as post. [Really nice – I have a thing about APIs and platforms needing a really easy way to get you to 'hello world' and this does it pretty well.]

[Line by line comments on the syntax of the queries might be useful, though they're pretty readable as it is.]

'select thingy, wotsit where [the slightly more complicated stuff]'

Can get back results in xml, also HTML, 'spreadsheet', JSON. Ugly but readable. Typed.

[Trying a query challenge set by others could be fun way to get started learning it.]

One problem – fictional places are in Wikipedia e.g. Liberty City in Grand Theft Auto.

Libris – how library websites should be
[I never used to appreciate how much most library websites suck until I started back at uni and had to use one for more than one query every few years]

Has a query interface through SPARQL

Comment from the audience BBC – now have SPARQL endpoint [as of the day before? Go BBC guy!].

Playing with mulgara, open source java triple store. [mulgara looks like a kinda faceted search/browse thing] Has own query language called TQL which can do more intresting things than SPARQL. Why use it? Schemaless data storage. Is to SQL what dynamic typing is to static typing. [did he mean 'is to sparql'?]

Question from audence: how do you discover what you can query against?
Answer: dbpedia website should list the concepts they have in there. Also some documentation of categories you can look at. [Examples and documentation are so damn important for the update of your API/web service.]

Coming soon [?] SPARUL – update language, SPARQL2: new features

The end!

[These are more (very) rough notes from the weekend's Open Hack London event – please let me know of clarifications, questions, links or comments. My other notes from the event are tagged openhacklondon.

Quick plug: if you're a developer interested in using cultural heritage (museums, libraries, archives, galleries, archaeology, history, science, whatever) data – a bunch of cultural heritage geeks would like to know what's useful for you (more background here). You can comment on the #chAPI wiki, or tweet @miaridge (or @mia_out). Or if you work for a company that works with cultural heritage organisations, you can help us work better with you for better results for our users.]

There were other lightning talks on Pachube (pronounced 'patchbay', about trying to build the internet of things, making an API for gadgets because e.g. connecting hardware to the web is hard for small makers) and Homera (an open source 3d game engine).

Running notes, day 3 (Saturday) of MW2009

These are my running notes from day 3 of the Museums and the Web conference – as the perfect is the enemy of the good I'm getting these up 'as is'. I did a demo [abstract] in the morning but haven't written up my notes yet – shame on me!

The session 'Building and using online collections' included three papers, I've got notes from all three but my laptop battery died halfway through the session so only some of them are already typed – I'll update this entry when I can sneak some time.

Paul Rowe presented on NZMuseums: Showcasing the collections of all New Zealand museums (the linked abstract includes the full paper and slides).

National Services Te Paerangi (NSTP).

4 million NZers, 400 museums.  NZMuseums website – focal point for all NZ museums. NSTP administers the site, Vernon Systems is solution provider.

Each museum has a profile page including highlights of their collections. Web-based collection management system.

What needs to be in place for small museums to contribute? How can a portal be built with limited resources? What features of the website would encourage re-use of the data?

Some museums had good web presences, but what about the small museums? Facing same issues that small or local govt museums in the UK face.

Museums are treasures of the country, they show who we are. Website needs to reflect that.

Focus groups – volunteers are important – keep it simple; keep costs low; some places had limited internet connectivity; reservations about content being on the internet were common.

Promoting involvement to the sector – used existing national monthly newsletters to advertise workshops and content deadlines. Minimum of 20 items for placement on site to avoid 'box ticking' [some real commitment required]. Used online forum for FAQs.

Lack of skills – NSTP were trained so could then train staff and volunteers in museums. Digitising, photography for the web.

Had to explain benefits to small museums. It gave them an easy start to getting an online presence.

They overcame resistance by allowing watermarking and clear copyright statements; they showed existing museums sites that allowed tagging; promoted that would help them reach a diverse dispersed audience.

First tag on site – 'shiny nose'. First comment was someone admitting they'd touched the nose on a bronze sculpture.

eHive.

Could also import Excel spreadsheets as content management system didn't exist at early stage of project. Also provided a workaround for people with lack of internet – the spreadsheet could be posted on CD.

API provides glue to connect eHive (Collections Management System) and NZMuseums site together.  

Tips for success
Use OS software where possible; use existing online forums and communication networks to save answering questions over again.

90% of these collection items not previously available on the internet. 99% of collection items have images.

[Kiwis are heroes!  Everyone was incredibly modest about their achievements, but I think they're amazing.]

Next was Eero Hyvönen on CultureSampo – Finnish Culture on the Semantic Web 2.0: Thematic Perspectives for the End-user (the linked abstract includes the full paper and slides).

Helsinki semantic web thingies
Part of national ontology project, Finland
Vision – international semantic web of cultural heritage. Marriage between semweb and web 2.0

Challenges – content heterogeneity, complexity 

Other challenge relates to the way cultural content is produced – Freebase, Wikipedia, open street maps, etc, 

Semweb for data integration; web.2 0 approach for content production

Automatically enriched by each piece of knowledge.

In Finnish the sampo is a magic drum that makes everything possible.  

Portal intended for human users and machines. Trying to establish a national way of producing content so can be published automatically.  

Infrastructure – 37,000 class concepts in ontology. MAO, TAO – museum ontologies, collaboratively built ontologies, then mapped to national system. End user sees one unified ontology. [A little pause while I pick my jaw up from the ground.]  66 vocabularies, taxonomies and ontologies available online as services, can be used as AJAX widgets. Some vocabularies are proprietary so can't be published online in the service.

28 content providers, 22 libraries and museums and some international associates like Getty places, Wikipedia.

16 different metadata schemas. [Including some for poetry!]

134,000 cultural collection items (artefacts, books, videos, etc)

285,000 other resources (places, people etc)

Annotation channel for content items – web 2.0 type interface.

Semantic web 2.0 portal

Portal users – for humans, Google-like but semantic search. Nine perspectives into cultural heritage. Three languages. Recently view items, recently commented items.  

Map view.

With one line of JavaScript on own website, can incorporate CultureSampo on own website.

[Sadly my laptop died here and the rest of my notes are handwritten.  You can probably get the gist from the published paper and the slide, but the coolness of their project was summed up by this tweet: Musebrarian: What can you do with a semantic knowledgebase? Search for "beard fashion in Finland" across time and place. #mw2009

It might not sound like much, but the breadth of content, and the number of interfaces onto it was awe-inspiring.]

Sadly my notes from Brian Dawson's paper, Collection effects: examining the actual use of on-line archival images are also still on notepaper.  The paper was a really useful examination of analytical approaches to understanding the motivations of people using cultural heritage collections.

Notes from the closing plenary, MW2009

These are my quick and dirty notes from the closing plenary of the 2009 Museums and the Web conference .  If I've quoted you but gotten your name wrong, I'm very sorry – please let me know and I'll correct it.  I haven't put links in for anyone yet so I'll be editing the entry anyway.

'We are the program.'  Awards for blog posts, tweets, Flickr photos then David Bearman invited people to come up and talk about what they've learnt, what they'll take away.

Nina, Museum 2.0 – inspired by Max's keynote address. But she didn't feel that difference in the institution. Didn't see the transparency and openness that you get on the web, on their dashboard. Not saying they have to do that, but wants to bring up idea of participatory ghetto… forming relationships with visitors on the web, who'll show up at museums and wonder why the same relationship isn't reflected in the building. Pushing in institutions to establish parity, not to give up on physical space also being somewhere for openness and transparency. IMA – had experience of extreme cognitive dissonance. How can you start the conversation, taking great stuff from web world into physical environment of institutions. Her first time at MW.

Heather from Balbao – new to conference and museum world, great introduction.

Nate, Walker Art Centre – I always leave inspired, seen it happen every time- a month worth of trying new things, then it trickles off and fades… go to the wiki and take the post-conference challenge to do one thing in April – choose one task that you can achieve by the end of April. Distributed agile development … beyond API, everyone can benefit from going home and immediately doing just one thing. [eek I feel weird taking notes about my ideas]

Frankie, Rattle – be excited about tin mining.

Brian, UKOLN – danger that losing accessibility cos doing innovative things, but there have been some really great examples. Universally accessible – pushing it (the definition) of it forward.

Seb, Powerhouse – need to bring people in, curators, management.

Julie (?) – boundaries between web and physical boundaries – problematising the name of the conference. Is 'web' starting to constrain what we're about?

Nina – comment on that – conference in US called WebWise – lousy content but less funded projects, mostly director level people who go. How do we get these people in a situation that's more blended with the kind of people who are here?

Victoria, Smithsonian? carrying on Nina and Seb's point – spends first month being excited, but directors etc aren't going to come to conferences like this. You may have five minutes to articulate why something is important – and it's not heard when it's someone outside, even if you've been saying it on the inside for years. Having someone who's succeeded from outside, doing snippets of video or whatever – convincing.

David – seeing what can share back. Spend time at conference demanding people write papers, share slides… would really love for the post-conference discussion that takes place online to incorporate thoughts, experience about what doing. Extension into social space of a discourse we've never really had – how do you use that post-conference excitement… how do organisations change, which is becoming the centre of the discourse… take it further, keep talking to each other about how do you make it work.

Jennifer – the thing we can do by the end of April, if you write a report, share it with your colleagues. Let people pinch your ideas, send it out. Share the reports as well as the stuff that happens when we're right here.

Jon Pratty – we need a more social media within the museum.

Peter Samis – can remember this camaraderie in 1991… hearing it just as fresh now with people who are coming to their first conference, loving it… this is going to have legs, it's going to keep running, continue this spirit throughout the year.

Rich (another Rich) – haven't really felt the amount of community before, but have been coming since 1999. Being able to catch up on the things he missed while he was here.

Brian – people in the community can fall out, it's happened in the UK. People have strongly held views, need to depersonalise disputes, constructive criticism.

Scott (?) – we're not the only people talking about these subjects, it's happening in higher education, the commercial sector, not a whole of discussion here about what's happening out there and what impact it has here. Would be neat to do some headlines on what's going on in the world outside museum, add to the implications for this audience.
[This final session probably contributed quite a bit to my summary of MW2009 – I'd written the 'MW2009 challenge' a little while before (after discussions at the ice cream API meet) and it was wonderful to feel so much excitement (tempered with realistic cynicism) in the room about the positive changes we could make when we went back to our home institutions.]

Oh noes, a FAIL! Notes from the unconference session on 'failure' at MW2009

These are my really rough notes from the unconference session at Museums and the Web, written up quickly in order to capture the essence of the discussion and open it up for comment.

Susan Chun, Dana Mitroff Silvers, Bruce Wyman and I began and were later joined by Seb Chan and Jennifer Trant.

I explained my motivation in suggesting the session – intelligent, constructive failure is important. Finding ways to create a space for that conversation isn't something we do well at the moment.

Susan started the conversation by pointing out that there were different definitions or types of failure. Defining 'failure' more precisely is useful.

Types of failures include: over budget, badly implemented, badly specified, future failures.

Dana pointed out that we needed to define success as well as defining failure. A more nuanced understanding of failure is important, especially when hoping to encourage more people to talk about failure. Discussion about choosing the right metrics for success – the right metrics may vary depending on whether you're a funder or a department or whoever.

Funding models can set you up for failure.

Bruce pointed out that it's not the failure that matters, it's what you do with the failure.

Some apparent failures may not really be failures.

Are you funding the process or the product?

Not having the mechanism for exposing the knowledge is a failure.

The definitions of failure and success need to include the net gain for an organisation or in new/improved processes as well as the product.

What kind of environment is needed so that people can publish judgements of their own success or failure?

Susan suggested the MCN project registry would be a good place for this information.

What if it was routine to talk about what failed or succeeded in each project? Funding should reward people who talk about failures. Discussion about space for reflection on 'lessons learned' in project summation.

Agency is important – you talk about the failures of your own projects, other people don't dob you in.

Dana – talking about failures in a project should be a normal part of MW papers.

Label it 'lessons learned', not 'failure'.

Susan – [Remove roadblocks about what happens if funders hear you think your project failed in some way -] Talk to funders about requiring an examination or reflection of each project for failure in the same way the issue of open source development was tackled. Pro-active approach!

Me: when you're putting in for funding, you should have to show that you've talked to people with similar projects about the lessons they learned.

Susan – put ILMS (Institute of Museum and Library Services) reports online. [A small but practical thing to do]. Change the culture of secrecy.

Funding can be a carrot and a stick. Without that, institutional change is hard.

Points of resistance (some summing up):
understanding how to define failure/success
culture of secrecy
fear of exposure to funders
lacking the jargon to describe failure (which would also help normalise the process of discussing it openly)

Jennifer – if there aren't any negative consequences, why can't you talk about it?

General discussion about the need for early, continual dialogue about projects. It's difficult to talk about failures if you're not already talking about the project. Paraphrasing Seb -talking about it already in an informal context, like a blog, may help here.

Iterative, transparent reporting is important. It also helps other people talk about failures.

Susan – other causes of failures are project that never happened. Whether they missed their time, didn't get funding, whatever. Consider those as failures too, and talk about them. Everyone benefits, whether that's the person with the great idea that never got to see it happen, or people who've built on it later.

Talk about nascent projects. Exposing them to comment early can help prevent failure. Like the old crack about voting, public discussion about projects should happen early and often!

Hoarding ideas is pointless.

We need a template for talking about failure. Prompts or questions for consideration.

It's not just overall project failures, it can include institutional, departmental or structural failures.

Dana suggested confessional sessions, perhaps at the next Museums and the Web conference. Jennifer and Seb took it up, suggesting YouTube captures with disguised voices and silhouettes to make it easier, and encouraging discussion of failures by type or theme.

Discussion about the role of commentators, respondents in sessions. The voice of the one that didn't work.

Find an acceptable form of critical questions so that people can help prevent other projects failing, make the most of the experience out there.

Putting my money where my mouth is, one final comment from Seb was about a possible failure of the unconference sessions in not getting people together again at the end to report back. This was received constructively, and might happen during the final plenary.

Yay! Three Lazy Geeks shortlisted for dev8D prize

We're in the top five, whoop!

The reviewers said, "Really comprehensive treatment of the problem and associated issues. Worth pursuing I think… As a solution this is a good idea and was produced by genuine collaboration at the Dev8D event."

So a short but happy developer post from me. The whole experience was lots of fun, and it would never have worked without Ian Ibbotson and Pete Sefton. I think the thing that I like most about it is that it not only re-uses existing tools, it fits with how people already work. It's not "this application will change your life, but first you have to change your life". I know that the (mostly junior) academics I've mentioned it to have loved the idea, so it might have real users if it was developed, which would be lovely.

Tim Berners-Lee at TED on 'database hugging' and linked data

This TED talk by Tim Berners-Lee: The next Web of open, linked data is worth watching if you've been 'wondering whatever happened to the semantic web?', or what this 'linked data' is about all.

I've put some notes below – I was transcribing it for myself and thought I might as well share it. It's only a selection of the talk and I haven't tidied it because they're not my words to edit.

Why is linked data important?

Making the world run better by making this data available. If you know about some data in some government department you often find that, these people, they're very tempted to keep it, to hug your database, you don't want to let it go until you've made a beautiful website for it. … Who am I to say "don't make a website…" make a beautiful website, but first, give us the unadulterated data. Give us the raw data now.

You have no idea, the number of excuses people come up with to hang onto their data and not give it to you, even though you've paid for it.

Communicating science over the web… the people who are going to solve those are scientists, they have half-formed ideas in their head, but a lot of the state of knowledge of the human race at the moment is in database, currently not sharing. Alzheimer's scientists … the power of being able ask questions which bridge across different disciplines is really a complete sea-change, it's very, very important. Scientists are totally stymied at the moment, the power of the data that other scientists have collected is locked up, and we need to get it unlocked so we can tackle those huge problems. if I go on like this you'll think [all data from] huge institutions but it's not. [Social networking is data.]

Linked data is about people doing their bit to produce their bit, and it all connecting. That's how linked data works. … You do your bit, everybody else does theirs. You may not have much data yourself, to put on there, but you know to demand it.

It's not just about the number of places where data comes. It's about connecting it together. When you connect it together you get this power… out of it. It'll only really pay off when everybody else has done it. It's called Linked Data, I want you to make it, I want you to demand it.

Competitions using APIs – any resources

I originally posted this on the Science Museum developers blog, filed under competition, mashups, requestforcomment.

The original impetus for creating this blog was to provide somewhere to talk about our plans, ask for feedback, and generally make the process of running a mashup competition using a set of object data created for an exhibition really transparent.

The project is close to signed-off, and I’ll go into more detail then, but in the meantime, here’s a post I sent to the MCG (museums computer group) email list:

Does anyone have good examples, bad examples, personal experience, whatever, on competition models, licensing, preservation, timelines, platforms, other public domain data sources, visualisation tools, etc? You can email me offlist if that’s easier, I can post a compiled list back here.

I was at JISC’s recent dev8D event and got some good ideas there, and I’m happy to share the research I’ve already done if anyone is interested.

Crowd-sourcing the translation of museum content into sign language?

We've been thinking about crowd-sourcing some British Sign Language (BSL) content for the Science Museum for a while now, particularly as we're running events with BSL interpreters and a new site ('Brought to Life') with some BSL content is due to launch in March. This post is both an attempt to think through some of the issues, and a question open to all – what do you think?

The idea
There are two related options – asking the public to share their translations of English text on the Science Museum websites or galleries into BSL with us, or asking people to contribute new content in BSL. Translations could include content like object captions (to view online or download to portable devices to take into the museum), exhibition information and interpretation, instructions for games like Launchpad – any existing content online or in the galleries.

Why it could be useful
Linda Ellis gave a presentation at the UK Museums Computer Group (MCG) meeting on 'Unheard Stories – Improving access for Deaf visitors' where she pointed out the distinctions between 'deaf' and 'Deaf', including that Deaf people use sign language as their first language and might not know English while deaf people probably become deaf later in life and English is their first language. Linda also said that Deaf people are one of the most excluded groups in our society. Deaf visitors surveyed for the Wolverhampton Arts and Museums Service said they wanted: concise written information; information in BSL; to explore exhibits independently; stories about local people and museum objects; events just for Deaf people (and dressing up, apparently).

(More notes on Linda's presentation and a link to her slides are in this earlier post).

I saw a great example of BSL content in museums at the 2009 Jodi Awards. The British Museum worked with the Frank Barnes School and media company Remark on a project where young deaf people produced signed curriculum resources for young deaf people. You can find out more and watch the videos at British Sign Language videos about the Museum.

Video goes mainstream?
One uncertainty is whether possible contributors would be comfortable creating and uploading video. The popularity of products like 'You Tube ready' digital compact cameras and the Flip would suggest that consumers are comfortable with the idea of creating and sharing video online.

The 2008 Horizon Report suggested 'grassroots video' will be adopted in one year or less:

Video is everywhere—and almost any device that can access the Internet can play (and probably capture) it. From user-created clips and machinima to creative mashups to excerpts from news or television shows, video has become a popular medium for personal communication. Editing and distribution can be done easily with affordable tools, lowering the barriers for production. Ubiquitous video capture capabilities have literally put the ability to record events in the hands of almost everyone. Once the exclusive province of highly trained professionals, video content production has gone grassroots.

In terms of understanding the context and perhaps expecting video online, a report The Valley looks towards 2009 in the BBC quotes Jim Patterson, product manager at YouTube, saying:

"This generation of users utilize the web differently and consume video differently. They grew up in an environment where digital, interactive media was ubiquitous. It has shaped how they use the web."

And Mr Patterson said this new video generation has also shaped the very nature of how YouTube is being used.

"Comscore is estimating that YouTube is the second largest search engine," he said.

"To this cohort, YouTube is their search engine. YouTube 'is' the web. Seeking the answer to any question, they prefer that the result be expressed as a video, so they go to YouTube."

That last point – "YouTube is their search engine. YouTube 'is' the web" – is pretty damn important, regardless of any other issues around museum content.

My questions

  • Am I imagining a need that isn't there? Are there enough people with British Sign Language as a first language who are interested in content at the Science Museum to make the project worthwhile? Is BSL content about particular objects or exhibitions something d/Deaf visitors would find useful?
  • Would anyone out there be interested in creating this content?
  • Is there enough acceptance of internet video? Is it easy enough for the public to produce and upload their own videos?

What do you think?

Catalhoyuk question

This will only be relevant to the archaeologists, I guess, but it has occurred to me to ask – what would you like to see in the Catalhoyuk archive reports? What information would either be useful or satisfy your curiosity?

In a wider sense, what can we (as IT geeks in the cultural heritage sector) learn from each other? What are we too scared to ask in case it's a stupid question, or because it seems too obscure? What don't we share because we assume that everyone else knows it already?