Just in case you thought nothing happened during the off-season…
A lot of this information is contained in the Archive Reports but as the audience for those is probably more specialised than the average reader of this blog, I thought it might be interesting to talk about them here.
When MoLAS first became involved with the project, there were lots of isolated Microsoft Access 2000 databases for excavation, finds and specialist data. I could see that the original database design and structure was well structured and much valuable work had been done on the database previously. However, some problems had arisen over the years as the database grew and different specialists brought their own systems based on a mixture of applications and platforms.
It was difficult for specialist databases to use live field or excavation data because it wasn't available in a single central source. It had also become almost impossible to run queries across excavation seasons or areas, or produce multi-disciplinary analysis , as there were disparate unrelated databases for each area of study. Within many specialisms the data set has been broken up into many different files – for example, the excavation database was split into teams and some teams were creating separate files for different years.
In many cases, referential integrity was not properly enforced in the interface or database structure. While the original database structures included tables to supply lists of values to enable controlled vocabularies, the interfaces were using static rather than dynamic menus on data entry interfaces. Primary and/or foreign keys were not implemented in some databases, leading to the possibility of multiple entries, anomalous data or incorrect codes being recorded. There was little or no validation on data entry.
IBM generously donated two new servers, one for use on site and the other for the Cambridge office. This meant that we were able to install Microsoft SQL Server 2000 to use as a single backend database and start re-centralising the databases. This meant re-combining the disparate datasets into a single, central database, and reconfiguring the Access forms to use this new centralised backend.
Centralising and cleaning the data and interfaces was a bit of a slog (covered in more detail in the archive reports, and even now there are still bits and pieces to be done. I guess this shows the importance of proper database design and documentation, even when you think a project is only going to be small. I'm sure there was documentation originally, so I guess this also shows the importance of a good archiving system!
Unfortunately, because the 'business logic' of the database applications wasn't documented (there probably was documentation that was lost over time) we couldn't re-do the existing forms in another application (like web forms) without losing all the validation and data entry rules that had been built up over time in response to the specialists' requirements. As usual in the world of archaeology, limited resources meant this wasn't possible at that stage. A lot of the application logic seemed to be held in the interfaces rather than in the relationships between tables, which meant a lot of data cleaning had to be done when centralising the databases and enforcing relationships.
As the 2004 Archive Report says, "The existing infrastructure was Microsoft Access based, and after consideration for minimal interruption to existing interfaces, and for the cost to the project of completely redeveloping the forms on another platform, these applications were retained."
Luckily, we're not tied to Access for new application development, and new and future database applications are created as HTML, eliminating any platform/OS compatibility issues.
This means that we can get on with more exciting things in the future! I'll post about some of those ideas soon.
In the meantime, check out the public version of the web interface to the Çatalhöyük database.
[Originally published on http://www.catalhoyuk.com/blog/, January 24, 2007]