The BBC have a story on a new search engine, Search site aims to rival Google:
Called Cuil [pronounced ‘cool’], from the Gaelic for knowledge and hazel, its founders claim it does a better and more comprehensive job of indexing information online.
The technology it uses to index the web can understand the context surrounding each page and the concepts driving search requests, say the founders.
But analysts believe the new search engine, like many others, will struggle to match and defeat Google.
Instead of just looking at the number and quality of links to and from a webpage as Google’s technology does, Cuil attempts to understand more about the information on a page and the terms people use to search. Results are displayed in a magazine format rather than a list.
From the Cuil FAQ:
So Cuil searches the Web for pages with your keywords and then we analyze the rest of the text on those pages. This tells us that the same word has several different meanings in different contexts. Are you looking for jaguar the cat, the car or the operating system?
We sort out all those different contexts so that you don’t have to waste time rephrasing your query when you get the wrong result.
Different ideas are separated into tabs; we add images and roll-over definitions for each page and then make suggestions as to how you might refine your search. We use columns so you can see more results on one page.
They also provide ‘drill-downs’ on the results page.
Cuil will direct you to this additional information. By looking at these suggestions, you may discover search data, concepts, or related areas of interest that you hadn’t expected. This is particularly useful when you are researching a subject you don’t know much about and aren’t sure how to compose the “right” query to find the information you need.
I haven’t used it enough to work out exactly how it differentiates concepts (tabs) and ‘additional information’ (drill-downs/categories).
It does a good job on something like the Cutty Sark. Under ‘Explore by Category’ it offered:
- Buildings And Structures In Greenwich
- Sailboat Names
- Museums In London
- Neighbourhoods Of Greenwich
- School Ships
It picked up search results for Cutty Sark whisky and news of the Cutty Sark fire but they weren’t reflected in the categories, and the search term didn’t trigger the tabs. The tabs kick in when you search for something like ‘orange‘.
It didn’t do as well with ‘samian ware‘ – the categories picked up all sorts of places and peoples, (and randomly ‘American Films’), but while the search results all say that it’s ‘a kind of bright red Roman pottery’ that’s not reflected in the categories. Fair enough, there may not be enough information easily available online so that ‘Types of Roman pottery’ registers as a category.
Incidentally, most of the results listed for ‘samian ware’ are just recycled entries from Wikipedia. It’s a shame the results aren’t filtered to remove entries that have just duplicated Wikipedia text. The FAQ says they don’t index duplicate content I guess the overall site or page is just different enough to be retained.
It might take a while for museum content to appear in the most useful ways, but it looks like it might be a useful search engine for niche content. From the FAQ again:
We’ve found that a lot of Web pages have been designed with a small audience in mind—perhaps they are blogs or academic papers with specific interests or pages with family photos. We think that even though these pages aren’t necessarily for a wide audience, they contain content that one day you might need.
Our job is to index all these pages and examine their content for relevancy to your search. If they contain information you need, then they should be available to you.
It’s all sounding a bit semantic web-ish (and quite a bit ‘reacting to Google-ish’) and I’ll use it for a while to see how it compared to Google. The webmaster information doesn’t give any indication of how you could mark up content so the relationships between terms in different contexts is clear, but I guess nice semantic markup would help.
Refreshingly, it doesn’t retain search info – privacy is one of their big differentiators from Google.