Open Objects – Page 29 – 'Every age has its orthodoxy and no orthodoxy is ever right.'

One step closer to intelligent searching?

The BBC have a story on a new search engine, Search site aims to rival Google:

Called Cuil [pronounced 'cool'], from the Gaelic for knowledge and hazel, its founders claim it does a better and more comprehensive job of indexing information online.

The technology it uses to index the web can understand the context surrounding each page and the concepts driving search requests, say the founders.

But analysts believe the new search engine, like many others, will struggle to match and defeat Google.

…

Instead of just looking at the number and quality of links to and from a webpage as Google's technology does, Cuil attempts to understand more about the information on a page and the terms people use to search. Results are displayed in a magazine format rather than a list.

From the Cuil FAQ:

So Cuil searches the Web for pages with your keywords and then we analyze the rest of the text on those pages. This tells us that the same word has several different meanings in different contexts. Are you looking for jaguar the cat, the car or the operating system?

We sort out all those different contexts so that you don't have to waste time rephrasing your query when you get the wrong result.

Different ideas are separated into tabs; we add images and roll-over definitions for each page and then make suggestions as to how you might refine your search. We use columns so you can see more results on one page.

They also provide 'drill-downs' on the results page.

Cuil will direct you to this additional information. By looking at these suggestions, you may discover search data, concepts, or related areas of interest that you hadn’t expected. This is particularly useful when you are researching a subject you don't know much about and aren't sure how to compose the "right" query to find the information you need.

I haven't used it enough to work out exactly how it differentiates concepts (tabs) and 'additional information' (drill-downs/categories).

It does a good job on something like the Cutty Sark. Under 'Explore by Category' it offered:

Buildings And Structures In Greenwich
Sailboat Names
Museums In London
Neighbourhoods Of Greenwich
School Ships

It picked up search results for Cutty Sark whisky and news of the Cutty Sark fire but they weren't reflected in the categories, and the search term didn't trigger the tabs. The tabs kick in when you search for something like 'orange'.

It didn't do as well with 'samian ware' – the categories picked up all sorts of places and peoples, (and randomly 'American Films'), but while the search results all say that it's 'a kind of bright red Roman pottery' that's not reflected in the categories. Fair enough, there may not be enough information easily available online so that 'Types of Roman pottery' registers as a category.

Incidentally, most of the results listed for 'samian ware' are just recycled entries from Wikipedia. It's a shame the results aren't filtered to remove entries that have just duplicated Wikipedia text. The FAQ says they don't index duplicate content I guess the overall site or page is just different enough to be retained.

It might take a while for museum content to appear in the most useful ways, but it looks like it might be a useful search engine for niche content. From the FAQ again:

We've found that a lot of Web pages have been designed with a small audience in mind—perhaps they are blogs or academic papers with specific interests or pages with family photos. We think that even though these pages aren't necessarily for a wide audience, they contain content that one day you might need.

Our job is to index all these pages and examine their content for relevancy to your search. If they contain information you need, then they should be available to you.

It's all sounding a bit semantic web-ish (and quite a bit 'reacting to Google-ish') and I'll use it for a while to see how it compared to Google. The webmaster information doesn't give any indication of how you could mark up content so the relationships between terms in different contexts is clear, but I guess nice semantic markup would help.

Refreshingly, it doesn't retain search info – privacy is one of their big differentiators from Google.

'Annoying adverts affect website traffic'

Via the BCS:

Nearly three quarters – 73 per cent – of internet users clicked away from a favourite website because of an annoying advert, according to research.

The survey, carried out by Opinion Matters for HowTo.tv, also revealed that 59 per cent no longer visited a particular website because of its advertising.

I use AdBlock for a serene and calm web experience, so when I use someone else's computer I'm always amazed at the sheer level of noise on the web and the crappiness of pages plastered with ads.

I was using Add-Art in conjunction with AdBlock, and will again when it support Firefox 3 because it's a lovely idea. If you haven't heard of Add-Art before, check out this Webmonkey article until you can install it on Firefox 3.

Giant squid dissection via live video

I've been watching the recording of the live stream of the first ever public dissection by Museum scientists of a giant squid.

Congratulations to everyone involved at Museum Victoria, it's a great use of technology and a great approach to openness. The explanations were beautifully clear, and did a great job of contextualising the research, the process and the animal itself.

I love the paparazzi-style photo flashes as they rolled the trolley out onto the main floor.

Portable mapping applications make managers happy

This webmonkey article, Multi-map with Mapstraction, about an 'open source abstracted JavaScript mapping library' called Mapstraction is perfectly on target for organisations that worry about relying on one mapping provider.

How many of these have you heard as possible concerns about using a particular mapping service?

Current provider might change the terms of service
Your map could become too popular and use up too many map views
Current provider quality might get worse, or they might put ads on your map
New provider might have prettier maps
You might get bored of current provider, or come up with a reason that makes sense to you

They're all reasonable concerns. But look what the lovely geeks have made:

The promise of Mapstraction is to only have to change two lines of code. Imagine if you had a large map with many markers and other features. It could take a lot of work to manually convert the map code from one provider to another.

And functionality is being expanded. I liked this:

One of my favorite Mapstraction features is automatic centering and zooming. When called on a map with multiple markers, Mapstraction calculates the center point of all markers and the smallest zoom level that will contain all the markers.

Open source rocks! Not only can you grab the code and have someone maintain it for you if you ever need to, but it sounds like a labour of geek love:

Mapstraction is maintained by a group of geocode lovers who want to give developers options when creating maps.

Microupdates and you (a.ka. 'twits in the museum')

I was trying to describe Twitter-esque applications for a presentation today, and I wasn't really happy with 'microblogging' so I described them as 'micro-updates'. Partly because I think of them as a bit like Facebook status updates for geeks, and partly because they're a lot more actively social than blog posts.

In case you haven't come across them, Twitter, Pownce, Jaiku, tumblr, etc, are services that let you broadcast short (140 characters) messages via a website or mobile device. I find them useful for finding like-minded people (or just those who also fancy a drink) at specific events (thanks to Brian Kelly for convincing me to try it).

You can promote a 'hash tag' for use at your event – yes, it's a tag with a # in front of it, low tech is cool. Ideally your tag should be short and snappy yet distinct, because it has to be typed in manually (mistakes happen easily, especially from a mobile device) and it's using up precious characters. You can use tools like Summize, hashtags, Quotably or Twemes to see if anyone else has used the same tag recently.

You can also ask people to use your event tag on blog posts, photos and videos to help bring together all the content about your event and create an ad hoc community of participants. Be aware that especially with Twitter-type services you may get fairly direct criticism as well as praise – incredibly useful, but it can seem harsh out of context (e.g. in a report to your boss).

More generally, you can use the same services above to search twitter conversations to find posts about your institution, events, venues or exhibitions. You can add in a search term and subscribe to an RSS feed to be notified when that term is used. For example, I tried http://summize.com/search?q="museum+of+london" and discovered a great review of the last 'Lates' event that described it as 'like a mini festival'. You should also search for common variations or misspellings, though they may return more false positives. When someone tweets (posts) using your search phrase it'll show up in your RSS reader and you can then reply to the poster or use the feedback to improve your projects.

This can be a powerful way to interact with your audience because you can respond directly and immediately to questions, complaints or praise. Of course you should also set up google alerts for blog posts and other websites but micro-update services allow for an incredible immediacy and directness of response.

As an example, yesterday I tweeted (or twitted, if you prefer):

me: does anyone know how to stop firefox 3 resizing pages? it makes images look crappy

I did some searching [1] and found a solution, and posted again:

me: aha, it's browser.zoom.full or "View → Zoom → Zoom Text Only" on windows, my firefox is sorted now

Then, to my surprise, I got a message from someone involved with Firefox [2]:

firefox_answers: Command/Control+0 (zero, not oh) will restore the default size for a page that's been zoomed. Also View->Zoom->Reset

me: Impressed with @firefox_answers providing the answer I needed. I'd been looking in the options/preferences tabs for ages

firefox_answers: Also, for quick zooming in & out use control plus or control minus. in Firefox 3, the zoom sticks per site until you change it.

Not only have I learnt some useful tips through that exchange, I feel much more confident about using Firefox 3 now that I know authoritative help is so close to hand, and in a weird way I have established a relationshp with them.

Finally, twitter et al have a social function – tonight I met someone who was at the same event I was last week who vaguely recognised me because of the profile pictures attached to Twitter profiles on tweets about the event. Incidentally, he's written a good explanation of twitter, so I needn't have written this!

[1] Folksonomies to the rescue! I'd been searching for variations on 'firefox shrink text', 'firefox fit screen', 'firefox screen resize' but since the article that eventually solved my problem called it 'zoom', it took me ages to find it. If the page was tagged with other terms that people might use to describe 'my page jumps, everything resizes and looks a bit crappy' in their own words, I'd have found the solution sooner.

[2] Anyone can create a username and post away, though I assume Downing Street is the real thing.

London Transport Museum's Flickr scavenger hunt

I haven't looked at the whole site yet but I loved the idea so I wanted to post it while you could still vote (until 20 July 2008):

London Transport Museum is hosting a Flickr scavenger hunt on Sunday 6th July in Covent Garden as part of the events for the London Festival of Architecture 2008. Focusing on the transport network's quirky design features, in a race against time teams of photographers will have to unlock a series of cryptic clues in order to snap roundels, station murals and much more. Have you got what it takes to get all the shots and make it back to the Museum? Prizes for the first team back (with the most correct answers), and – voted by the public – the best team and the best picture uploaded on Flickr.

We're all suckers for museums

A lovely post on 'Why Museums Are Important to Me' that also contains a reminder of the need to consciously communicate properly with those outside the sector:

When you work in a museum you sometimes forget what it's like NOT to work in a museum. Museums can be very absorbing little worlds, because they have such odd functions and corners and people in them.

Museums might not offer the best working conditions in the world (especially if you work in a profession that's usually better paid), but there are good reasons why most people who work in museums love their jobs.

20% time – an experiment (with some results)

A company called Atlassian have been experimenting with allowing their engineers 20% of their time to work on free or non-core projects (a la Google). They said:

You see, while everyone knows about Google's 20% time and we've heard about all the neat products born from it (Google News, GMail etc) – we've found it extremely difficult to get any hard facts about how it actually works in practice.

So they started with a list of questions they wanted to answer through their experiment, and they've been blogging about it at http://blogs.atlassian.com/developer/20_percent_time/. It makes for interesting reading, and it's great to see some real evidence starting to emerge.

Hat tip: Tech-Ed Collisions.

Learn web standards for free

So now you have no excuse – it's free, accessible, and "designed to give anyone a solid grounding in web design/development, no matter who they are" (and what they might/not already know):

Learning Web Standards just got easier. Opera's new Web Standards Curriculum is a complete course to teach you standards-based web development, including HTML, CSS, design principles and background theory, and JavaScript basics.

Interesting, the introduction says, "I am mainly aiming this at universities, as I believe the standards of education in web standards to be somewhat lacking at many universities".

More at Learn to build a better Web with Opera.

The Future of the Web with Sir Tim Berners-Lee @ Nesta

The Future of the Web with Sir Tim Berners-Lee at Nesta, London, July 8.

My notes from the Nesta event, The Future of the Web with Sir Tim Berners-Lee, held in London on July 8, 2008.

nesta panel — Panel at 'The Future of the Web' with Sir Tim Berners-Lee, Nesta

As usual, let me know of any errors or corrections, comments are welcome, and comments in [square brackets] are mine. I wanted to get these notes up quickly so they're pretty much 'as is', and they're pretty much about the random points that interested me and aren't necessarily representative. I've written up more detailed notes from a previous talk by Tim Berners-Lee in March 2007, which go into more detail about web science.

[Update: the webcast is online at http://www.nesta.org.uk/future-of-web/ so you might as well go watch that instead.]

The event was introduced by NESTA's CEO, Jonathan Kestenbaum. Explained that online contributions from the pre-event survey, and from the (twitter) backchannel would be fed into the event. Other panel members were Andy Duncan from Channel 4 and the author Charlie Leadbeater though they weren't introduced until later.

Tim Berners-Lee's slides are at http://www.w3.org/2008/Talks/0708-ws-30min-tbl/.

So, onto the talk:
He started designing the web/mesh, and his boss 'didn't say no'.

He didn't want to build a big mega system with big requirements for protocols or standards, hierarchies. The web had to work across boundaries [slide 6?]. URIs are good.

The World Wide Web Consortium as the point where you have to jump on the bob sled and start steering before it gets out of control.

Producing standards for current ideas isn't enough; web science research is looking further out. Slide 12 – Web Science Research Initiative (WSRI) – analysis and synthesis; promote research; new curriculum.

Web as blockage in sink – starts with a bone, stuff builds up around it, hair collect, slime – perfect for bugs, easy for them to get around – we are the bugs (that woke people up!). The web is a rich environment in which to exist.

Semantic web – what's interesting isn't the computers, or the documents on the computers, it's the data in the documents on the computers. Go up layers of abstraction.

Slide on the Linked Open Data movement (dataset cloud) [Anra from Culture24 pointed out there's no museum data in that cloud].

Paraphrase, about the web: 'we built it, we have a duty to study it, to fix it; if it's not going to lead to the kind of society we want, then tweak it, fix it'.

'Someone out there will imagine things we can't imagine; prepare for that innovation, let that innovation happen'. Prepare for a future we can't imagine.

End of talk! Other panelists and questions followed.

Charles Leadbeater – talked about the English Civil War, recommends a book called 'The World Turned Upside Down'. The bottom of society suddenly had the opportunity to be in charge. New 'levellers' movement via the web. Participate, collaborate, (etc) without the trappings of hierarchy. 'Is this just a moment' before the corporate/government Restoration? Iterative, distributed, engaged with practice.

Need new kinds of language – dichotomies like producer/consumer are disabling. Is the web – a mix of academic, geek, rebel, hippie and peasant village cultures – a fundamentally different way of organising, will it last? Are open, collaborative working models that deliver the goals possible? Can we prevent creeping re-regulation that imposes old economics on the new web? e.g. ISPs and filesharing. Media literacy will become increasingly important. His question to TBL – what would you have done differently to prevent spam while keeping the openness of the web? [Though isn't spam more of a problem for email at the moment?]

Andy Duncan, CEO of Channel 4 – web as 'tool of humanity', ability for humans to interact. Practical challenges to be solved. £50million 4IP fund. How do we get, grow ideas and bring them to the wider public, and realise the positive potential of ideas. Battle between positive public benefit vs economic or political aspects.

The internet brings more/different perspectives, but people are less open to new ideas – they get cosy, only talk to like-minded people in communities who agree with each other. How do you get people engaged in radical and positive thinking? [This is a really good observation/question. Does it have to do with the discoverability of other views around a topic? Have we lost the serendipity of stumbling across random content?]

Open to questions. 'Terms and conditions' – all comments must have a question mark at the end of them. [I wish all lectures had this rule!]

Questions from the floor: 1. why is the semantic web taking so long; 2. 3D web; 3. kids.
TBL on semantic web – lots of exponential growth. SW is more complicated to build than HTML system. Now has standard query language (SPARQL). Didn't realise at first that needed a generic browser and linked open data. (Moving towards real world).

[This is where I started to think about the question I asked, below – cultural heritage institutions have loads of data that could be open and linked, but it's not as if institutions will just let geeks like me release it without knowing where and why and how it will be used – and fair enough, but then we need good demonstrators. The idea that the semantic web needs lots of acronyms (OWL, GRDDL, RDF, SPARQL) in place to actually happen is a perception I encounter a lot, and I wanted an answer I could pass on. If it's 'straight from the horse's mouth', then even better…]

Questions from twitter (though the guy's laptop crashed): 4. will Google own the world? What would Channel 4 do about it?; 5. is there a contradiction between [collaborative?] open platform and spam?; 6. re: education, in era of mass collaboration, what's the role of expertise in a new world order? [Ooh, excellent question for museums! But more from the point of view of them wondering what happens to their authority, especially if their collections/knowledge start to appear outside their walls.]

AD: Google 'ferociously ambitious in terms of profit', fiercely competitive. They should give more back to the UK considering how much they take out. Qu to TBL re Google, TBL did not bite but said, 'tremendous success; Google used science, clustering algorithms, looked at the web as a system'.
CL re qu 5 – the web works best through norms and social interactions, not rules. Have to be careful with assumption that can regulate behaviour -> 'norm based behaviour'. [But how does that work with anti-social individuals?]
TBL re qu 6: e.g. MIT Courseware – experts put their teaching materials on the web. Different people have different levels of expertise [but how are those experts recognised in their expert context? Technology, norms/links, a mixture?]. More choice in how you connect – doesn't have to be local. Being an expert [sounds exhausting!] – connect, learn, disseminate – huge task.

Questions from the floor: 7. ISPs as villains, what can they do about it?; 9. why can't the web be designed to use existing social groups? [I think, I was still recovering from asking a question] TBL re qu 7 and ISPs 'give me non-discriminatory access and don't sell my clickstream'. [Hoorah!]

So the middle question (Question 8) was me. It should have been something like 'if there's a tension between the top-down projects that don't work, and simple protocols like HTML that do, and if the requirements of the 'Semantic Web' are top-down (and hard), how do we get away from the idea that the semantic web is difficult to just have the semantic web?'* but it came out much more messily than that as 'the semantic web as proposed is a top-down system, but the reason the web worked was that it was simple, easy to participate, so how does that work, how do we get the semantic web?' and his response started "Who told you SW is top down?". It was a leading question so it's my fault, but the answer was worth asking a possibly stupid/leading question. His full answer [about two minutes at 20'20" minutes in on the Q&A video] was: 'Who on earth told you the semantic web was a top-down designed system? It's not. It is totally bottom-out. In fact the really magic thing about it is that it's middle-out as well. If you imagine lots of different data systems which talk different languages, it's a bit like imagine them as a quilt of those things sewn together at the edges. At the bottom level, you can design one afternoon a little data system which uses terms and particular concepts which only you use, and connect to nobody else. And then, in a very bottom-up way, start meeting more and more people who'll start to use those terms, and start negotiating with people, going to, heaven forbid, standards bodies and committees to push, to try to get other people to use those terms. You can take an existing set of terms, like the concepts when you download a bank statement, you'll find things like the financial institution and transaction and amount have pretty much been defined by the banks, you can take those and use those as semantic web terms on the net. And if you want to, you can do that at the very top level because you might decide that it's worth everybody having exactly the same URI for the concept of latitude, for the number you get out of the GPS, and you can join the W3C interest group which has gotten together people who believe in that, and you've got the URI, [people] went to a lot of trouble to make something which is global. The world works like that plug of stuff in the sink, it's a way of putting together lots and lots of different communities at different levels, only some of them, a few of them are global. The global communities are hard work to make. Lots and lots and lots of them are local, those are very easy to make. Lots of important benefits are in the middle. The semantic web is the first technology that's designed with an understanding of that's how the world is, the world is a scale-free, fractal if you like, system. And that's why it's all going to work.'

[So I was asking 'how do we get to the semantic web' in the museum sector – we can do this. Put a dataset out there, make connections to the organisation next to you (or get your users to by gathering enough anonymised data on how they link items through searching and browsing). Then make another connection, and another. We could work at the sector (national or international) level too (stable permanent global identifiers would be a good start) but start with the connections. "Small pieces loosely joined" -> "small ontologies, loosely joined". Can we make a manifesto from this?

There's also a good answer in this article, Sir Tim Talks Up Linked Open Data Movement on internetnews.com.

"He urged attendees to look over their data, take inventory of it, and decide on which of the things you'd most likely get some use out of re-using it on the Web. Decide priorities, and benefits of that data reuse, and look for existing ontologies on the Web on how to use it, he continued, referring to the term that describes a common lexicon for describing and tagging data."

Anyway, on with the show.]

[*Comment from 2015: in hindsight, my question speaks to the difficulties of getting involved in what appeared to be distant and top-down processes of ontology development, though it might not seem that distant to someone already working with W3C. And because museums are tricky, it turns out the first place to start is getting internal museum systems to talk to each other – if you can match people, places, objects and concepts across your archive, library and museum collections management systems, digital asset management system and web content management system, you're in a much better position to match terms with other systems. That said, the Linking Museums meetups I organised in London and various other museum technology forums were really helpful.]

Questions from the floor: 10. do we have enough "bosses who don't say no"?; 11. web to solve problems, social engineering [?]; 12. something on Rio meeting [didn't get it all].

TBL re 10 – he can't emulate other bosses but he tries to have very diverse teams, not clones of him/each other, committed, excited people and 'give them spare time to do things they're interested in'. So – give people spare time, and nurture the champions. They might be the people who seem a bit wacky [?] but nurture the ones who get it.

Qu 11 – conflicting demands and expectations of web. TBL – 'try not to think of it as a thing'. It's an infrastructure, connections between people, between us. So, are we asking too much of us, of humanity? Web is reflection of humanity, "don't expect too little".

TBL re qu 12 – internet governance is the Achilles heel of the web. No permission required except for domain name. A 'good way to make things happen slowly is to get a bureaucracy to govern it'. Slowness, stability. Domain names should last for centuries – persistence is a really important part of the web.

CL re qu 11 – possibilities of self-governance, we ask too little of the web. Vision of open, collaborative web capable of being used by people to solve shared problems.

JK – (NESTA) don't prescribe the outcome at the beginning, commitment to process of innovation.

Then Nesta hosted drinks, then we went to the pub and my lovely mate said "I can't believe you trolled Tim Berners-Lee". [I hope I didn't really!]