Some of the history of the Catalhoyuk database

Just in case you thought nothing happened during the off-season…

A lot of this information is contained in the Archive Reports but as the audience for those is probably more specialised than the average reader of this blog, I thought it might be interesting to talk about them here.

When MoLAS first became involved with the project, there were lots of isolated Microsoft Access 2000 databases for excavation, finds and specialist data. I could see that the original database design and structure was well structured and much valuable work had been done on the database previously. However, some problems had arisen over the years as the database grew and different specialists brought their own systems based on a mixture of applications and platforms.

It was difficult for specialist databases to use live field or excavation data because it wasn’t available in a single central source. It had also become almost impossible to run queries across excavation seasons or areas, or produce multi-disciplinary analysis , as there were disparate unrelated databases for each area of study. Within many specialisms the data set has been broken up into many different files – for example, the excavation database was split into teams and some teams were creating separate files for different years.

In many cases, referential integrity was not properly enforced in the interface or database structure. While the original database structures included tables to supply lists of values to enable controlled vocabularies, the interfaces were using static rather than dynamic menus on data entry interfaces. Primary and/or foreign keys were not implemented in some databases, leading to the possibility of multiple entries, anomalous data or incorrect codes being recorded. There was little or no validation on data entry.

IBM generously donated two new servers, one for use on site and the other for the Cambridge office. This meant that we were able to install Microsoft SQL Server 2000 to use as a single backend database and start re-centralising the databases. This meant re-combining the disparate datasets into a single, central database, and reconfiguring the Access forms to use this new centralised backend.

Centralising and cleaning the data and interfaces was a bit of a slog (covered in more detail in the archive reports, and even now there are still bits and pieces to be done. I guess this shows the importance of proper database design and documentation, even when you think a project is only going to be small. I’m sure there was documentation originally, so I guess this also shows the importance of a good archiving system!

Unfortunately, because the ‘business logic’ of the database applications wasn’t documented (there probably was documentation that was lost over time) we couldn’t re-do the existing forms in another application (like web forms) without losing all the validation and data entry rules that had been built up over time in response to the specialists’ requirements. As usual in the world of archaeology, limited resources meant this wasn’t possible at that stage. A lot of the application logic seemed to be held in the interfaces rather than in the relationships between tables, which meant a lot of data cleaning had to be done when centralising the databases and enforcing relationships.

As the 2004 Archive Report says, “The existing infrastructure was Microsoft Access based, and after consideration for minimal interruption to existing interfaces, and for the cost to the project of completely redeveloping the forms on another platform, these applications were retained.”

Luckily, we’re not tied to Access for new application development, and new and future database applications are created as HTML, eliminating any platform/OS compatibility issues.

This means that we can get on with more exciting things in the future! I’ll post about some of those ideas soon.

In the meantime, check out the public version of the web interface to the Çatalhöyük database.

[Originally published on http://www.catalhoyuk.com/blog/, January 24, 2007]

Some of the history of the Catalhoyuk database

I was going to post this on the Catalhoyuk blog but authentication isn’t working right now. So, I’ll post it here and move it over when it’s working again.

Just in case you thought nothing happened during the off-season…

A lot of this information is contained in the Archive Reports but as the audience for those is probably more specialised than the average reader of this blog, I thought it might be interesting to talk about them here.

When MoLAS first became involved with the project, there were lots of isolated Microsoft Access 2000 databases for excavation, finds and specialist data. I could see that the original database design and structure was well structured and much valuable work had been done on the database previously. However, some problems had arisen over the years as the database grew and different specialists brought their own systems based on a mixture of applications and platforms.

It was difficult for specialist databases to use live field or excavation data because it wasn’t available in a single central source. It had also become almost impossible to run queries across excavation seasons or areas, or produce multi-disciplinary analysis , as there were disparate unrelated databases for each area of study. Within many specialisms the data set has been broken up into many different files – for example, the excavation database was split into teams and some teams were creating separate files for different years.

In many cases, referential integrity was not properly enforced in the interface or database structure. While the original database structures included tables to supply lists of values to enable controlled vocabularies, the interfaces were using static rather than dynamic menus on data entry interfaces. Primary and/or foreign keys were not implemented in some databases, leading to the possibility of multiple entries, anomalous data or incorrect codes being recorded. There was little or no validation on data entry.

IBM generously donated two new servers, one for use on site and the other for the Cambridge office. This meant that we were able to install Microsoft SQL Server 2000 to use as a single backend database and start re-centralising the databases. This meant re-combining the disparate datasets into a single, central database, and reconfiguring the Access forms to use this new centralised backend.

Centralising and cleaning the data and interfaces was a bit of a slog (covered in more detail in the archive reports), and even now there are still bits and pieces to be done. I guess this shows the importance of proper database design and documentation, even when you think a project is only going to be small. I’m sure there was documentation originally, so I guess this also shows the importance of a good archiving system!

Unfortunately, because the ‘business logic’ of the database applications wasn’t documented (if there was documentation it’d been lost over time) we couldn’t re-do the existing forms in another application (like web forms) without losing all the validation and data entry rules that had been built up over time in response to the specialists’ requirements. As usual in the world of archaeology, limited resources meant this wasn’t possible at that stage. A lot of the application logic seemed to be held in the interfaces rather than in the relationships between tables, which meant a lot of data cleaning had to be done when centralising the databases and enforcing relationships.

As the 2004 Archive Report says, “The existing infrastructure was Microsoft Access based, and after consideration for minimal interruption to existing interfaces, and for the cost to the project of completely redeveloping the forms on another platform, these applications were retained.”

Luckily, we’re not tied to Access for new application development, and new and future database applications are created as HTML, eliminating any platform/OS compatibility issues.

This means that we can get on with more exciting things in the future! I’ll post about some of those ideas soon.

In the meantime, check out the public version of the web interface to the Çatalhöyük database.

Notes on usability testing

Further to my post about the downloadable usability.gov guidelines, I’ve picked out the bits from the chapter on ‘Usability Testing’ that are relevant to my work but it’s worth reading the whole of the chapter if you’re interested. My comments or headings are in square brackets below.

“Generally, the best method is to conduct a test where representative participants interact with representative scenarios.

The second major consideration is to ensure that an iterative approach is used.

Use an iterative design approach

The iterative design process helps to substantially improve the usability of Web sites. One recent study found that the improvements made between the original Web site and the redesigned Web site resulted in thirty percent more task completions, twenty-five percent less time to complete the tasks, and sixty-seven percent greater user satisfaction. A second study reported that eight of ten tasks were performed faster on the Web site that had been iteratively designed. Finally, a third study found that forty-six percent of the original set of issues were resolved by making design changes to the interface.

[Soliciting comments]

Participants tend not to voice negative reports. In one study, when using the ’think aloud’ [as opposed to retrospective] approach, users tended to read text on the screen and verbalize more of what they were doing rather than what they were thinking.

[How many user testers?]

Performance usability testing with users:
– Early in the design process, usability testing with a small number of users (approximately six) is sufficient to identify problems with the information architecture (navigation) and overall design issues. If the Web site has very different types of users (e.g., novices and experts), it is important to test with six or more of each type of user. Another critical factor in this preliminary testing is having trained usability specialists as the usability test facilitator and primary observers.
– Once the navigation, basic content, and display features are in place,
quantitative performance testing … can be conducted

[What kinds of prototypes?]

Designers can use either paper-based or computer-based prototypes. Paper-based prototyping appears to be as effective as computer-based prototyping when trying to identify most usability issues.

Use inspection evaluation [and cognitive walkthroughs] results with caution.
Inspection evaluations include heuristic evaluations, expert reviews, and cognitive walkthroughs. It is a common practice to conduct an inspection evaluation to try to detect and resolve obvious problems before conducting usability tests. Inspection evaluations should be used cautiously because several studies have shown that they appear to detect far more potential problems than actually exist, and they also tend to miss some real problems.

Heuristic evaluations and expert reviews may best be used to identify potential usability issues to evaluate during usability testing. To improve somewhat on the performance of heuristic evaluations, evaluators can use the ’usability problem inspector’ (UPI) method or the ’Discovery and Analysis Resource’ (DARe) method.

Cognitive walkthroughs may best be used to identify potential usability issues to evaluate during usability testing.

Testers can use either laboratory or remote usability testing because they both elicit similar results.

[And finally]

Use severity ratings with caution.”

Useful background on usability testing

I came across www.usability.gov while looking for some background information on usability testing to send colleagues I’m planning some user evaluation with. It looks like a really useful resource for all stages of a project from planning to deployment.

Their guidelines are available to download in PDF form, either as entire book or specific chapters.

“Encouraging a “There Are No Dumb Questions” culture is only part of the solution. What we really need is a “There are No Dumb Answers” policy.”

How to Build a User Community, Part 1 offers some good solutions to the kinds of issues I’ve worried about when thinking about our user communities. I think it’s a good basis for some guidelines but really we just need to get it up and running and see how our users respond.

Are small museums the long tail?

On the way home from the Semantic Web Think Tank last week (see previous post), I suddenly thought: are small or specialised museums the long tail?

Each museum by itself would represent a tiny proportion of the overall use of museum collections online, but if you put all that usage together, would their collections in fact have a higher rate of use than those of more ‘popular’ museums?

At the moment I don’t think there’s any way to find out, because so many small or specialised museums don’t have collections online, through a lack of expertise, digitisation resources or an easy-to-use publication infrastructure. Still, it’s an interesting question.

Semantic Web ThinkTank

I went to the Semantic Web Think Tank meeting on “Social Software and the User Experience of the Semantic Web” in Brighton on Thursday. I’m still thinking about the discussions a lot, but here are some of my thoughts. This isn’t an official report of the day, and they’re in entirely random order and mixed up with other issues I’ve been thinking about lately.

We were asked to introduce ourselves and briefly describe our interest in the Semantic Web at the start of the session. I explained that I have a long-standing interest in user experiences online, and in the presentation of collections online. I’ve been interested in discovering whether we’re actually using the most effective schema, formats, navigation and interfaces for our audiences for a long time, so sessions like this are a delight.

User-generated content
A lot of the conversation was about user-generated content rather than the users’ experience of the semantic web, possibly because museums are thinking hard about user-generated content at the moment.

We talked about models for the presentation of user-generated content that would suggest users are comfortable distinguishing between content generated within an institution and that written by other users, such as Amazon reviews.

I didn’t raise this at the time but while the overall quality of Amazon reviews and Wikipedia entries are encouraging, the Yahoo! Answers service makes me despair for humanity. Maybe it doesn’t have the snob value of other social software sites like Amazon or Wikipedia, but the answers tend to be pretty low quality and sometimes possibly even maliciously wrong. Importantly, stupid or bad answers don’t tend to be rated down the way a less insightful Amazon review would be.

However, overall it does show that there are existing models of user-generated content that we can follow – we don’t have to invent them to start publishing user-generated content on our museum websites.

As an aside, hopefully our users don’t discount ‘official’ museum content the way users tend to disregard the publisher blurbs on Amazon – we’re told that users regard museums as trustworthy and ‘objective’ and I would hope regarded with some affection.

I’d never really thought about using folksonomies as a form of feedback that would inform the process of creating ontologies but once Areti raised it I started thinking about it. I guess I’ve always seen them as serving slightly different purposes, and as I don’t think they don’t compete in any sense, I hadn’t seen the need to change how ontologies are constructed. I guess it depends – while internal ontologies don’t need to be user-friendly, museums have a tendency to re-use them as navigation and information architecture on a website, where they do need to suit the audiences.

There was some discussion about the barriers to participation for museums and the possibility of resistance from curators and other museum staff. I’ve been lucky that so far I haven’t encountered any resistance but I think generally we can use internal goodwill to engage new, non-traditional or disengaged audiences as a motivator. Our barriers to participation are those old favourites, time and money.

I think I must have been hungry because I started thinking about collections online as RSS as ‘home delivery’ from a range of menus and traditional online collections as going to a restaurant – the restaurant chooses the range of items you can order and in what format they’ll be delivered.

Not all users are equal!
User-generated content isn’t written by random voices from undifferentiated mass of users. Reputation and trust are important, whether ‘Real Name’ reviewers on Amazon, established authors on Wikipedia, or eBay sellers with good feedback. What impact might that have on museum content that’s ‘leaked out’ and lost its original context?

Knowing what our users want matters, and simultaneously doesn’t
Towards the end of the morning session I decided that we can’t predict how semantic web users will use unfettered access, so maybe we should just build it and see what happens, instead of trying to second guess them. In a way, the Semantic Web is post-User Centred Design because we’re not designing the applications but providing repositories of data that can be used in what could be called User Created Applications. It’s not that users don’t matter, it’s that now isn’t the time to make assumptions about what they want – our dialogue with them should be very open-ended.

As for what ‘it’ is – maybe a repository of objects published in a sector-wide digital object model or schema? There was some discussion of whether objects could be published in microformats, but I think they’re too big for that. Otoh, if we have a repository where each object has a permanent URI, we could put selected data into microformats that can refer back to the URI.

We can better predict how user-generated content might relate to our existing infrastructure so we should try to cater to known models and requirements.

Funding
The semantic web can cause problems for museums funded according to the number of visitors through their door or to their website. We need to redefine the measures of success to incorporate content that’s used outside the infrastructure of the originating museum; or we can refer users onto commercial services such as picture libraries. In terms of development, we can aim to created re-usable and sustainable infrastructures in any new applications developed so they can be used to deliver content both to the target audience/application and beyond.

Other random thoughts
I’ve also realised that maybe we need to take a step back and ask “do we actually know who our users are?” before we can assess the effectiveness of online collections. There generally accepted groupings of users we talk about, but do they reflect reality? It may well be that we’re on the right track but it would be good to confirm this. Interestingly, since I started writing this post, I’ve noticed that my old workmates Jonny Brownbill and Darren Peacock are presenting a session titled Audiences, Visitors and Users: Reconceptualising users of museum online content and services at MW2007 so hopefully research in this area is moving forward.

I can relate this to the discussions on Thursday but actually it came out of a conversation I had with my workmate Jeremy beforehand: does Australia’s history with models of distance education like the “School of the Air” mean that Australian museums have a different understanding of how to present collections online? Australian museums have had extensive collections online for years, possibly a lot earlier than museums in Europe or North America.

Update: the workshop report is now online.

“Shoppers are likely to abandon a website if it takes longer than four seconds to load, a survey suggests.

It found 75% of the 1,058 people asked would not return to websites that took longer than four seconds to load.” Akamai study as reported on the BBC.

It’s a study of online shopping habits, but I wonder if the same holds for cultural sector sites. I guess that says something either about my knowledge of existing audience evaluation or the paucity of existing information.

The article doesn’t report whether the study analysed the results by gender, but this article, Key Website Research Highlights Gender Bias, suggests that gender makes a big difference to the user experience:

“Despite the parity of target audience, the results found that 94% of the sites displayed a masculine orientation with just 2% displaying a typically female bias.”