Some of the history of the Catalhoyuk database

I was going to post this on the Catalhoyuk blog but authentication isn't working right now. So, I'll post it here and move it over when it's working again.

Just in case you thought nothing happened during the off-season…

A lot of this information is contained in the Archive Reports but as the audience for those is probably more specialised than the average reader of this blog, I thought it might be interesting to talk about them here.

When MoLAS first became involved with the project, there were lots of isolated Microsoft Access 2000 databases for excavation, finds and specialist data. I could see that the original database design and structure was well structured and much valuable work had been done on the database previously. However, some problems had arisen over the years as the database grew and different specialists brought their own systems based on a mixture of applications and platforms.

It was difficult for specialist databases to use live field or excavation data because it wasn't available in a single central source. It had also become almost impossible to run queries across excavation seasons or areas, or produce multi-disciplinary analysis , as there were disparate unrelated databases for each area of study. Within many specialisms the data set has been broken up into many different files – for example, the excavation database was split into teams and some teams were creating separate files for different years.

In many cases, referential integrity was not properly enforced in the interface or database structure. While the original database structures included tables to supply lists of values to enable controlled vocabularies, the interfaces were using static rather than dynamic menus on data entry interfaces. Primary and/or foreign keys were not implemented in some databases, leading to the possibility of multiple entries, anomalous data or incorrect codes being recorded. There was little or no validation on data entry.

IBM generously donated two new servers, one for use on site and the other for the Cambridge office. This meant that we were able to install Microsoft SQL Server 2000 to use as a single backend database and start re-centralising the databases. This meant re-combining the disparate datasets into a single, central database, and reconfiguring the Access forms to use this new centralised backend.

Centralising and cleaning the data and interfaces was a bit of a slog (covered in more detail in the archive reports), and even now there are still bits and pieces to be done. I guess this shows the importance of proper database design and documentation, even when you think a project is only going to be small. I'm sure there was documentation originally, so I guess this also shows the importance of a good archiving system!

Unfortunately, because the 'business logic' of the database applications wasn't documented (if there was documentation it'd been lost over time) we couldn't re-do the existing forms in another application (like web forms) without losing all the validation and data entry rules that had been built up over time in response to the specialists' requirements. As usual in the world of archaeology, limited resources meant this wasn't possible at that stage. A lot of the application logic seemed to be held in the interfaces rather than in the relationships between tables, which meant a lot of data cleaning had to be done when centralising the databases and enforcing relationships.

As the 2004 Archive Report says, "The existing infrastructure was Microsoft Access based, and after consideration for minimal interruption to existing interfaces, and for the cost to the project of completely redeveloping the forms on another platform, these applications were retained."

Luckily, we're not tied to Access for new application development, and new and future database applications are created as HTML, eliminating any platform/OS compatibility issues.

This means that we can get on with more exciting things in the future! I'll post about some of those ideas soon.

In the meantime, check out the public version of the web interface to the Çatalhöyük database.

[Originally published on http://www.catalhoyuk.com/blog/, January 24, 2007]

Catalhoyuk diaries: What I did on my summer holidays

I was on site at Çatalhöyük for two weeks, and while I was there I contributed to the Catalhoyuk blog.

For me, it was a good opportunity to explain what I do on site – people are often confused about why a database developer would be going out to work on site in Turkey.

After that, I spent a few days in Istanbul, where I went to the Çatalhöyük exhibition and also to Istanbul Modern.

Then I caught a train to Bucharest, where I started my holiday. I visited Romania, Moldova, Transdniestr, Ukraine, and finally flew home from Krakow a month later.

Catalhoyuk diaries: August 2006

It's hard to believe my time here has flown so quickly. I’m sad to be leaving on Friday, and I feel like there’s not enough time to get everything I want to done before I go. The Istanbul team left today and some specialists are starting to leave, so the site already feels a little like it’s winding down for the season.

This season I've mostly been concentrating on working through recording structures and developing the interface for a unified Clay Objects database application.

I've been talking about my ideas this application for the past two years so it's great to finally get a chance to make it real. I've had lots of intense discussions with pottery, figurines and building materials specialists as well as a new member of the team who is looking at the changes in the fabric of the site across artefact types and excavation processes.

Working with someone who'll actually be benefiting from this unified approach to recording, and re-examining existing recording so that as far as possible clay fabrics or matrices are recorded according to diagnostic evidence (particularly someone with a background in geology who is able to bring a lot of technical expertise to the project) has been a real advantage.

I've also been working with the Human Remains team on their database, but they've made my job easy by developing their own forms based on the data structures I sent from London. They've been really inspiring – it's one thing to tell people that they have the power to create their own interfaces and queries, but it's so much better to see people actually do it.

In between that (and sometimes the development happens in-between solving other problems) I've been dealing with smaller fixes to existing databases, dealing with network issues and all kinds of things that you come up when people ask you any computer-related questions.

My brain has been working overtime; there are so many new ways of interrogating the databases now that everything is becoming integrated that the possibilities seem endless. We’ve had a few seminars, and people presenting their PhD research, and I come out with ideas for new improvements every time. It’s also been fascinating hearing how their data translates into a picture of life lived on the mound.

The model of creating 'new' database applications by combining existing data across specialisms with new interpretation and specialist recording will be the basis for the architectural and beads databases, and I'm already excited by that. One idea that fascinates me is the idea of recording things like that the wall paintings in the database so they can be linked to other representational artefacts like figurines and stamp seals. A 'representational database' could look at similarities and differences in images across all and any materials or artefact types. Do wall paintings show the same kinds of artistic, personal or cultural concerns as the figurines? Do certain types of features occur with certain kinds of representational artifacts?

Anyway, I don’t want to hog the internet computer, so it’s back to work for me.

[Originally published on http://www.catalhoyuk.com/blog/, August 2, 2006]

Catalhoyuk diaries: Settling in

It's my first proper working day on site this season and I'm slowly working my way through Sarah's documentation of the database work and general IT issues she's encountered while she was here. At this stage, I only have one major 'new' application to work on, and in large part that's thanks to Sarah's hard work over the past months, both on- and off-site.

I'm hoping that now that the hard grunt work of centralising, bug fixing, cleaning and consolidating the existing databases over the past few years is (mostly) over, and the applications I created in previous years are bedded in, I'll have a real chance to think about what else we can do with all this data. I was so busy before I left London that I hadn't really had a chance to get excited about coming back to Catalhoyuk but as soon as I was on my way I realised that this could be an immensely intellectually rich and rewarding two weeks.

There's always so much new technology, I'm sure there's a knack to not getting carried away by every new possibility. But I can't help but wonder what would happen if we recreated Catalhoyuk in Second Life or another 3D world. Imagine re-populating the mound with a living community of real people!

I'd love to see how we could use semantic web/Web 2.0 technologies to open up our data to the rest of the world. I'm interested in the tagging technologies emerging through folksonomies like steve.museum, and wonder if we could apply them to the finds data we publish on the web.

I've realised that you could almost think of the excavation diary entries as blog posts, in which case Catalhoyuk has a blog that goes back to 1997.

[Originally published on http://www.catalhoyuk.com/blog/, July 22, 2006]