Tom Morris, SPARQL and semweb stuff – tech talk at Open Hack London

Tom Morris gave a lightning talk on ‘How to use Semantic Web data in your hack‘ (aka SPARQL and semantic web stuff).

He’s since posted his links and queries – excellent links to endpoints you can test queries in.

Semantic web often thought of as long-promised magical elixir, he’s here to say it can be used now by showing examples of queries that can be run against semantic web services. He’ll demonstrate two different online datasets and one database that can be installed on your own machine.

First – dbpedia – scraped lots of wikipedia, put it into a database. dbpedia isn’t like your averge database, you can’t draw a UML diagram of wikipedia. It’s done in RDF and Linked Data. Can be queried in a language that looks like SQL but isn’t. SPARQL – is a w3c standard, they’re currently working on SPARQL 2.

Go to dbpedia.org/sparql – submit query as post. [Really nice – I have a thing about APIs and platforms needing a really easy way to get you to ‘hello world’ and this does it pretty well.]

[Line by line comments on the syntax of the queries might be useful, though they’re pretty readable as it is.]

‘select thingy, wotsit where [the slightly more complicated stuff]’

Can get back results in xml, also HTML, ‘spreadsheet’, JSON. Ugly but readable. Typed.

[Trying a query challenge set by others could be fun way to get started learning it.]

One problem – fictional places are in Wikipedia e.g. Liberty City in Grand Theft Auto.

Libris – how library websites should be
[I never used to appreciate how much most library websites suck until I started back at uni and had to use one for more than one query every few years]

Has a query interface through SPARQL

Comment from the audience BBC – now have SPARQL endpoint [as of the day before? Go BBC guy!].

Playing with mulgara, open source java triple store. [mulgara looks like a kinda faceted search/browse thing] Has own query language called TQL which can do more intresting things than SPARQL. Why use it? Schemaless data storage. Is to SQL what dynamic typing is to static typing. [did he mean ‘is to sparql’?]

Question from audence: how do you discover what you can query against?
Answer: dbpedia website should list the concepts they have in there. Also some documentation of categories you can look at. [Examples and documentation are so damn important for the update of your API/web service.]

Coming soon [?] SPARUL – update language, SPARQL2: new features

The end!

[These are more (very) rough notes from the weekend’s Open Hack London event – please let me know of clarifications, questions, links or comments. My other notes from the event are tagged openhacklondon.

Quick plug: if you’re a developer interested in using cultural heritage (museums, libraries, archives, galleries, archaeology, history, science, whatever) data – a bunch of cultural heritage geeks would like to know what’s useful for you (more background here). You can comment on the #chAPI wiki, or tweet @miaridge (or @mia_out). Or if you work for a company that works with cultural heritage organisations, you can help us work better with you for better results for our users.]

There were other lightning talks on Pachube (pronounced ‘patchbay’, about trying to build the internet of things, making an API for gadgets because e.g. connecting hardware to the web is hard for small makers) and Homera (an open source 3d game engine).

Rasmus Lerdorf on Hacking with PHP – tech talk at Open Hack London

Same deal as my first post from today’s Open Hack London event – these are (very) rough notes, please let me know of clarifications, questions or comments.

Hacking with PHP, Rasmus Lerdorf

Goal of talk: copy and pastable snippets that just work so you don’t have to fight to get things that work [there’s not enough of this to help beginners get over that initial hump]. The slides are available at http://talks.php.net/show/openhack and these notes are probably best read as commentary alongside the code examples.

[Since it’s a hack day, some] Hack ideas: fix something you use every day; build your own targeted search engine; improve the look of search results; play with semantic web tools to make the web more semantic; tell the world what kind of data you have – if a resume, use hResume or other appropriate microformats/markup; go local – tools for helping your local community; hack for good – make the world a better place.

SearchMonkey and BOSS are blending together a little bit.

What we need to learn
With PHP – enough to handle simple requests; talk to backend datastore; how to parse XML with PHP, how to generate JSON, some basic javasccript, a JavaScript utility library like YUI or jquery.

parsing XML: simpleXML_load_file() – can load entire URL or local file.

Attributes on node show up as array. Namespace attributes call children of node, name namespace as argument.

Now know how to parse XML, can get lots of other stuff.
Context extraction service, Yahoo – doesn’t get enough attention. Post all text, gives you back four or five key terms – can then do an image search off them. Or match ads to webpages.

Can use get or post (curl) – usually too much for get.

PHP to JavaScript on initial page load: JSON_encode -> javascript.

Javascript to PHP (and back)
If you can figure out these six lines of code, you can write anything in the world. How every modern web application works.
Server-side php, client-side javascript.

‘There’s nothing to building web applications, you just have to break everything down into small enough chunks that it all becomes trivial’.

AJAX in 30 seconds.
Inline comments in code would help for people reading it without hearing the talk at the same time.

JavaScript libraries to the rescue
load maps API, create container (div) for the map, then fill it.

Form – on submit call return updateMap(); with new location.

YGeoRSS – if have GeoRSS file… can point to it.

GeoPlanet – assigns a WOE ID to a place. Locations are more than just a lat long – carry way more information. Basically gives you a foreign key. YQL is starting to make the web a giant database. Can make joins across APIs – woeid works as fk.

YQL – ‘combines all the APIs on the web into a single API’.

Add a cache – nice to YQL, and also good for demos etc. Copy and paste cache function from his slides – does a local cache on URL. Hashed with md5. Using PHP streams – #defn. Adding a cache speeds up developing when hacking (esp as won’t be waiting for the wifi). [This is a pretty damn good tip cos it’s really useful and not immediately obvious.]

XPath on URL using PHP’s OAuth extension

SearchMonkey – social engineering people into caring about semantic data on the web. For non-geeks, search plug-in mechanism that will spruce up search results page. Encourages people to add semantic data so their search result is as sexy as their competitors – so goal is that people will start adding semantic data.

‘If you’re doing web stuff, and don’t know about microformats, and your resume doesn’t have hResume, you’re not getting a job with Yahoo.’

Question: how are microformats different to RDFa?
Answer: there are different types of microformats – some very specific ones, eg hResume, hCal. RDFa – adding arbitrary tags to page. even if no specific way to describe your data. But there’s a standard set of mark-ups for a resume so can use that. if your data doesn’t match anything at microfomats.org then use RDFa or erdf (?).