Semantic Web ThinkTank

I went to the Semantic Web Think Tank meeting on "Social Software and the User Experience of the Semantic Web" in Brighton on Thursday. I'm still thinking about the discussions a lot, but here are some of my thoughts. This isn't an official report of the day, and they're in entirely random order and mixed up with other issues I've been thinking about lately.

We were asked to introduce ourselves and briefly describe our interest in the Semantic Web at the start of the session. I explained that I have a long-standing interest in user experiences online, and in the presentation of collections online. I've been interested in discovering whether we're actually using the most effective schema, formats, navigation and interfaces for our audiences for a long time, so sessions like this are a delight.

User-generated content
A lot of the conversation was about user-generated content rather than the users' experience of the semantic web, possibly because museums are thinking hard about user-generated content at the moment.

We talked about models for the presentation of user-generated content that would suggest users are comfortable distinguishing between content generated within an institution and that written by other users, such as Amazon reviews.

I didn't raise this at the time but while the overall quality of Amazon reviews and Wikipedia entries are encouraging, the Yahoo! Answers service makes me despair for humanity. Maybe it doesn't have the snob value of other social software sites like Amazon or Wikipedia, but the answers tend to be pretty low quality and sometimes possibly even maliciously wrong. Importantly, stupid or bad answers don't tend to be rated down the way a less insightful Amazon review would be.

However, overall it does show that there are existing models of user-generated content that we can follow – we don't have to invent them to start publishing user-generated content on our museum websites.

As an aside, hopefully our users don't discount 'official' museum content the way users tend to disregard the publisher blurbs on Amazon – we're told that users regard museums as trustworthy and 'objective' and I would hope regarded with some affection.

I'd never really thought about using folksonomies as a form of feedback that would inform the process of creating ontologies but once Areti raised it I started thinking about it. I guess I've always seen them as serving slightly different purposes, and as I don't think they don't compete in any sense, I hadn't seen the need to change how ontologies are constructed. I guess it depends – while internal ontologies don't need to be user-friendly, museums have a tendency to re-use them as navigation and information architecture on a website, where they do need to suit the audiences.

There was some discussion about the barriers to participation for museums and the possibility of resistance from curators and other museum staff. I've been lucky that so far I haven't encountered any resistance but I think generally we can use internal goodwill to engage new, non-traditional or disengaged audiences as a motivator. Our barriers to participation are those old favourites, time and money.

I think I must have been hungry because I started thinking about collections online as RSS as 'home delivery' from a range of menus and traditional online collections as going to a restaurant – the restaurant chooses the range of items you can order and in what format they'll be delivered.

Not all users are equal!
User-generated content isn't written by random voices from undifferentiated mass of users. Reputation and trust are important, whether 'Real Name' reviewers on Amazon, established authors on Wikipedia, or eBay sellers with good feedback. What impact might that have on museum content that's 'leaked out' and lost its original context?

Knowing what our users want matters, and simultaneously doesn't
Towards the end of the morning session I decided that we can't predict how semantic web users will use unfettered access, so maybe we should just build it and see what happens, instead of trying to second guess them. In a way, the Semantic Web is post-User Centred Design because we're not designing the applications but providing repositories of data that can be used in what could be called User Created Applications. It's not that users don't matter, it's that now isn't the time to make assumptions about what they want – our dialogue with them should be very open-ended.

As for what 'it' is – maybe a repository of objects published in a sector-wide digital object model or schema? There was some discussion of whether objects could be published in microformats, but I think they're too big for that. Otoh, if we have a repository where each object has a permanent URI, we could put selected data into microformats that can refer back to the URI.

We can better predict how user-generated content might relate to our existing infrastructure so we should try to cater to known models and requirements.

The semantic web can cause problems for museums funded according to the number of visitors through their door or to their website. We need to redefine the measures of success to incorporate content that's used outside the infrastructure of the originating museum; or we can refer users onto commercial services such as picture libraries. In terms of development, we can aim to created re-usable and sustainable infrastructures in any new applications developed so they can be used to deliver content both to the target audience/application and beyond.

Other random thoughts
I've also realised that maybe we need to take a step back and ask "do we actually know who our users are?" before we can assess the effectiveness of online collections. There generally accepted groupings of users we talk about, but do they reflect reality? It may well be that we're on the right track but it would be good to confirm this. Interestingly, since I started writing this post, I've noticed that my old workmates Jonny Brownbill and Darren Peacock are presenting a session titled Audiences, Visitors and Users: Reconceptualising users of museum online content and services at MW2007 so hopefully research in this area is moving forward.

I can relate this to the discussions on Thursday but actually it came out of a conversation I had with my workmate Jeremy beforehand: does Australia's history with models of distance education like the "School of the Air" mean that Australian museums have a different understanding of how to present collections online? Australian museums have had extensive collections online for years, possibly a lot earlier than museums in Europe or North America.

Update: the workshop report is now online.

"Shoppers are likely to abandon a website if it takes longer than four seconds to load, a survey suggests.

It found 75% of the 1,058 people asked would not return to websites that took longer than four seconds to load." Akamai study as reported on the BBC.

It's a study of online shopping habits, but I wonder if the same holds for cultural sector sites. I guess that says something either about my knowledge of existing audience evaluation or the paucity of existing information.

The article doesn't report whether the study analysed the results by gender, but this article, Key Website Research Highlights Gender Bias, suggests that gender makes a big difference to the user experience:

"Despite the parity of target audience, the results found that 94% of the sites displayed a masculine orientation with just 2% displaying a typically female bias."

Interesting use of location-aware devices at the Tower of London.

"The new game employs HP's iPAQ handheld devices and location sensors to trigger the appropriate digital file, which includes voices, images, music and clues.

HP said that developing the new game has helped it to explore opportunities for new products and services that will emerge around the delivery of location and other context-based experiences."

Via the BCS.

I've always wanted to do something like a 'museum outside the walls' where hand-held devices or mobile phones deliver content based on your location. They could be used in walking tours, or signs could let people know that content is available. London has so many layers of history, and the Museum has so much content about London's histories.

User-generated content and the general public vs invited experts

We've been having discussions at work about the promises and challenges of user-generated content. In that light, this article is quite timely:

"The estranged founder of Wikipedia, the online encyclopaedia written entirely by members of the public, is to launch a rival that he says is less likely to be riddled with errors.
Larry Sanger says that vast swaths of the anarchic encyclopaedia he helped create in 2001 are in desperate need of an editor – and that is what he is promising for his new project.

Mr Sanger has begun signing up academics furious at the mistakes and generalisations they find on Wikipedia's articles on their specialist subjects, and vowed to give these experts a special role to shape articles on"


Catalhoyuk diaries: What I did on my summer holidays

I was on site at Çatalhöyük for two weeks, and while I was there I contributed to the Catalhoyuk blog.

For me, it was a good opportunity to explain what I do on site – people are often confused about why a database developer would be going out to work on site in Turkey.

After that, I spent a few days in Istanbul, where I went to the Çatalhöyük exhibition and also to Istanbul Modern.

Then I caught a train to Bucharest, where I started my holiday. I visited Romania, Moldova, Transdniestr, Ukraine, and finally flew home from Krakow a month later.

Clay pipe recording at MoLAS and "Clay tobacco pipe makers' marks from London" website

[Update, September 2017: the site appears not to be supported by the Museum of London or MOLA, but there's an archived version at that should provide access to equivalent pages of the links listed below.]

[Update, December 2011: if you're interested in clay pipes, you may be interested in Locating London's Past. The site also has an article that explains how Museum of London Archaeology (MoLA) Datasets – including clay pipes and glass – have been incorporated into the site.  NB: other than adding these links, I haven't updated the original 2006 paper below, so it doesn't include any enhancements made for this new work.  On a personal note, it's lovely to see that the sites, and the backend work behind them, still have value.]

Wheel symbol with pellets between the spokes, c 1610-40.

I'm just back from giving a paper at the Society for Clay Pipe Research Conference, held at the LAARC today. I thought I'd share the content of my paper online so that other people interested in digitising and publishing collections online could see how one particular project was implemented.

Some interesting feedback from the question session afterwards was that other archaeological units, museums or researchers might be interested in publishing records to the same site. In that case, I'd be happy to review the structures so they could be generalised (for other identifiers, for example) and publish them as an open standard along with more detailed information on the digitisation process.

Anyway, here's the text of the paper:

Clay pipe recording at MoLAS and the stamped makers’ mark website


Mia Ridge, Database Developer, Museum of London


SCPR ANNUAL CONFERENCE, September 16th 2006


London Archaeological Archive and Research Centre, Mortimer Wheeler House



The paper discusses the process from initial specification through requirements gathering, database design, development of the database application and website, to publication online.


The project began with a proposal to create a database of clay tobacco pipe makers' marks from London:

"…a physical and digital database of clay tobacco pipe makers’ marks found in excavated contexts from London, dating to between c 1580 and 1910. This will encompass examples of makers’ marks, both stamped and moulded, on pipes made in London and imported from further afield, both in the UK and on the Continent. … The digital version of the database will be made available online, as part of the MoLAS website"

The work had two parts – enhancing the MoLAS Oracle database so that it could record more detailed information about the maker's marks; and creating a website to publish the marks and related images and information online.

Requirements gathering

'Requirements gathering' is the process of scoping and defining a project. The first step towards this is to define the internal and external stakeholders; the second is to determine their requirements. Internal stakeholder requirements include modified forms and structure for recording enhanced data and analysis, while external requirements relate to the publication of the data to defined groups of website users. It is important to define the targeted users of your website so that its content, site architecture and functionality can be tailored to them.

The targeted users of the site were largely determined by the subject matter. The main users will be specialists, followed by general adults. Site functionality, considered as search or browse capabilities, was determined as a balance between the purpose of the site, the needs of its visitors and the content and infrastructure we have available.

The database and website also had to be expandable to provide for greater temporal or geographic coverage, including collections throughout the Greater London area. Finally, in order to design data structures that would best meet the needs of the project, I had to consider nature of the material to be recorded.

The first discussions with Jacqui and Tony were about the requirements for the website. We then met to review the existing Oracle data structures and discuss the necessary changes. I asked lots of questions about how makers marks related to clay pipes – where, what kind and how many might appear on a pipe? It was important to understand how they varied, and which properties of position, type and method were significant, as well as to understand the exceptions. As you know, one 'IS' stamp is not necessarily the same as another 'IS' stamp – the trick is to enable to application to understand the difference. In this process, the aim is not to uncover the detail of the subject but to understand how its typologies are constructed.

Once the requirements have been determined, data structures were designed accordingly. These were presented to Jacqui and Tony, and reviewed in response to their feedback. Prototype forms were then designed to allow data entry, and the same process of feedback and modification followed. Significant changes were made after testing and further modifications were made as necessary during the implementation process as the practical implications of the modifications became clear.

One of the challenges of database design is balancing the benefits of recording in a more structured way, which provides for much greater flexibility in analysis, search and publication against a smaller learning curve and greater efficiency in data entry. For example, as different types of information are separated out of free text or general comments into more precise fields, the time required to record each entry increases.

As the data structures were finalised, queries were run to populate the modified structures with existing data, where possible.

Database design and development

The MoLAS Oracle database is used by our archaeologists and specialists to record field, find and environmental data. It has been developed in-house over many years and is one of the largest databases of its kind in the UK. As the database and forms are maintained in-house, we are able to modify it to meet our needs as required for projects or day-to-day business.

In the MoLAS database an individual pipe record must have a unique combination of sitecode, context, accession number and form that is different from any other pipe record. This unique identifier forms the basis of the database application. This combination of identifiers, called a 'primary key', can be used create links from a pipe with a particular mark to possible pipe makers. The sitecode can also be used to link to information about the particular excavation. Should a specialist desire, they can also link to other finds from the same context as well as related excavation and environmental data. The existing table structure was modified to support recording clay pipes and maker's marks in a more semantically structured way, with more detail and additional attributes.

The existing comments field was split into four new fields: general comments, maker comments, publication references, and parallels. I ran a report that listed all the existing comments so they could be manually reviewed and separated out into the relevant content areas.

A new field was added to mark pipe records that were to be published on the web. Other enhancements included new fields such as completeness, mould, manufacturing evidence, fabric, pipe length, links to photographs and illustrations, as well as a new numerical field, 'die' to allow the recording of individual dies known to be have been used by a single maker or workshop. The final new field was one that allows a particular pipe to be marked as containing the best example of a particular mark.

Some of the new fields required the creation of lists of values. These appear as drop-down menus on the data entry forms, and are used to make data entry faster and reduce errors. They are implemented as tables and can be designed so the values can be edited or added to as required.

When creating new fields, it's important to judge the effect on existing data, particularly in a project that can only selectively enhance records. As the project grant covered the enhancement of records for 120 marked clay pipes made between c 1580 and 1680, a small percentage of the entire dataset, many existing records would not have any data for the new fields. If it is not possible to go back and record the relevant information in the new field for each existing record, might that affect the validity of the data set as a whole? Will queries or searches return unexpected results if values aren't recorded consistently? Sometimes it is possible to apply a default value for existing records, or to mark previous records as 'not recorded'.

New tables were created to record information about known pipe makers. This includes their name, address, earliest and latest known dates, and free text including documentary evidence for this information. As this information is recorded in the database rather than in text files, it can be more easily searched and combined with related information and pipes for publication.

Additional tables were created to record the relationship between a mark on a particular pipe and a possible maker, including the probability of any pipe being related to a individual maker plus any publication references.

Content preparation


Enhancing database records

The basic process for enhancing stamped pipe records was:

    1. Add the webcode 'CoLAT' to the pipes that will be included in this project


    1. Add the photo number to the Photo number field


    1. Review and update the pipes entries


    1. Create the makers entries


  1. Add pipes to the sub-form on the makers form to create the link between pipe and definite/probable/possible maker

Two queries were created to help monitor progress and give an idea of how the data would look on the website. One was a report showing which records have been successfully marked up for export to the website. The other showed how the links between makers and pipes would be displayed and could be used to check the success of a link created between a mark and possible maker.

Other content preparation

While the technical database and website development and specialist recording work was underway, Jacqui had organised for the MoLAS photographer to take photos of the marks that were going to appear on the website. The photo number was then recorded in the database. The scripts that generate the website use this to link the right image to the right pipe and mark for display on the web page. Jacqui also wrote text for inclusion on the site and definitions of codes used in the database were created, to make the published records more user-friendly and clear to non-specialists.

The website

The address of the website, 'Clay tobacco pipe makers' marks from London' is
It is held as a 'collections microsite' within the Museum of London website structure.

The front page

The front page is designed to provide direct access to the data while contextualising the content, making the current scope of the project and the goals of the site immediately clear. It also allows us to thank our funders.

The design of the website was based on templates developed for the LAARC site. The front page introduces the navigation and title banner, which remain consistent throughout the site. There are three immediately clear 'calls to action' for the user on the front page: browse maker's marks, browse makers, and search for marks.

Browse maker's marks

In this section of the site, you can view thumbnails of the best example of each mark. The initials or description of the mark are listed with each thumbnail. This means you can search the text of the page for a particular mark, and also aids accessibility and helps search engines index the site, while still being visually appealing.

View mark

From the list of marks, you can access the page for a particular mark. Where appropriate, this page contains a more detailed description of the mark; images of all the pipes with that mark plus the sitecode, excavation context number and bowl form for that pipe. It also displays the dates associated with that form and the die number for each pipe on the page. Each image of a mark is also a link to the particular pipe page.

View pipe

Each pipe page contains the initials or description of its makers mark, a description of the pipe, its burnishing and milling as well as information about the excavation in which the pipe was found. This includes the address, easting and northing of the site. It would be possible to link to the full site record in LAARC, particularly when the LAARC site has been redeveloped. The page also lists any possible or known makers and the certainty of their being the maker of that pipe. The name of each maker is a link to the maker.

View maker

This page displays the name, address, earliest and latest dates plus any additional commentary and publication references for the maker. It also lists each pipe they might have made with the probability of their being the maker.

Browse makers
This page displays a list of all makers on the site. The name of each maker is a link to the full information about that maker, as above.

The search for this site is fairly simple, but the functionality can be expanded if necessary.
It searches the description field for a match to the search term.

About the project

This section contains excellent information for general visitors and specialists about the project, why clay pipes are important for archaeologists, the clay pipes of London as well as a glossary and references. These pages contextualise the study of clay pipes, enriching the general visitor's experience and providing specific information about the background to tobacco pipe makers’ marks found in excavated contexts from London for specialist users.

From the database to the website

The first step in publishing the enhanced content online is getting data from the internal database to the web server. The web server is the computer that sends out the pages when a visitor clicks on the link.

SQL scripts run on the MoLAS server extract data from the MoLAS Oracle database and other sources, combining them into a form suitable for publication on the website. This data is stored in tables on the web server database. Information about the archaeological sites is drawn from data published through the London Archaeological Archive Resource Centre (LAARC) web site. This data is linked throughthe sitecode on the pipe or mark database record.

The scripts also combine information that has been stored separately into the publication format according to the relationships defined in the database. For example, they may bring together a pipe with possible makers through the pipe marks recorded. Where necessary, the database extraction scripts also extract the code definitions from List of Value tables. These translate a value like the mark position code 'BR' into the more user-friendly 'on the bowl, on right side as smoked'.

The website is generated by another collection of scripts, using a web scripting language called ASP. These scripts can be thought of as templates that contain placeholders for different types of information or images. When a page is requested, the script runs and fills the appropriate information in the appropriate part of the template.

Because these templates dynamically generate the pages, the design and the content are separated. This means the site can be expanded as new records are added to the database. Updated or new content can appear on the site instantly, without waiting for IT resources to be available. It's also easy to update the templates so that new fields can be viewed, new links generated or additional search parameters added. The graphic design (the 'look and feel' of the site) or the site navigation can be updated in a single script and the change is immediately visible across all site pages.

The 'about' section of the website also contains 'static' pages. These pages do not change when the content of the database changes. However, they use the same scripts to generate the design and navigation so can easily be changed as necessary.

One of the requirements for the site was that it was accessible to search engines. As the project did not have a budget for marketing the site, search engines were going to be the main source of website visitors. The site was designed using 'semantic markup' which not only helps search engines understand the structure of the site, but also aids accessibility for people with disabilities.


Catalhoyuk diaries: August 2006

It's hard to believe my time here has flown so quickly. I’m sad to be leaving on Friday, and I feel like there’s not enough time to get everything I want to done before I go. The Istanbul team left today and some specialists are starting to leave, so the site already feels a little like it’s winding down for the season.

This season I've mostly been concentrating on working through recording structures and developing the interface for a unified Clay Objects database application.

I've been talking about my ideas this application for the past two years so it's great to finally get a chance to make it real. I've had lots of intense discussions with pottery, figurines and building materials specialists as well as a new member of the team who is looking at the changes in the fabric of the site across artefact types and excavation processes.

Working with someone who'll actually be benefiting from this unified approach to recording, and re-examining existing recording so that as far as possible clay fabrics or matrices are recorded according to diagnostic evidence (particularly someone with a background in geology who is able to bring a lot of technical expertise to the project) has been a real advantage.

I've also been working with the Human Remains team on their database, but they've made my job easy by developing their own forms based on the data structures I sent from London. They've been really inspiring – it's one thing to tell people that they have the power to create their own interfaces and queries, but it's so much better to see people actually do it.

In between that (and sometimes the development happens in-between solving other problems) I've been dealing with smaller fixes to existing databases, dealing with network issues and all kinds of things that you come up when people ask you any computer-related questions.

My brain has been working overtime; there are so many new ways of interrogating the databases now that everything is becoming integrated that the possibilities seem endless. We’ve had a few seminars, and people presenting their PhD research, and I come out with ideas for new improvements every time. It’s also been fascinating hearing how their data translates into a picture of life lived on the mound.

The model of creating 'new' database applications by combining existing data across specialisms with new interpretation and specialist recording will be the basis for the architectural and beads databases, and I'm already excited by that. One idea that fascinates me is the idea of recording things like that the wall paintings in the database so they can be linked to other representational artefacts like figurines and stamp seals. A 'representational database' could look at similarities and differences in images across all and any materials or artefact types. Do wall paintings show the same kinds of artistic, personal or cultural concerns as the figurines? Do certain types of features occur with certain kinds of representational artifacts?

Anyway, I don’t want to hog the internet computer, so it’s back to work for me.

[Originally published on, August 2, 2006]

Catalhoyuk diaries: Settling in

It's my first proper working day on site this season and I'm slowly working my way through Sarah's documentation of the database work and general IT issues she's encountered while she was here. At this stage, I only have one major 'new' application to work on, and in large part that's thanks to Sarah's hard work over the past months, both on- and off-site.

I'm hoping that now that the hard grunt work of centralising, bug fixing, cleaning and consolidating the existing databases over the past few years is (mostly) over, and the applications I created in previous years are bedded in, I'll have a real chance to think about what else we can do with all this data. I was so busy before I left London that I hadn't really had a chance to get excited about coming back to Catalhoyuk but as soon as I was on my way I realised that this could be an immensely intellectually rich and rewarding two weeks.

There's always so much new technology, I'm sure there's a knack to not getting carried away by every new possibility. But I can't help but wonder what would happen if we recreated Catalhoyuk in Second Life or another 3D world. Imagine re-populating the mound with a living community of real people!

I'd love to see how we could use semantic web/Web 2.0 technologies to open up our data to the rest of the world. I'm interested in the tagging technologies emerging through folksonomies like, and wonder if we could apply them to the finds data we publish on the web.

I've realised that you could almost think of the excavation diary entries as blog posts, in which case Catalhoyuk has a blog that goes back to 1997.

[Originally published on, July 22, 2006]