This post on 'What comes after post-processualism' caught my eye, I guess because I have a fascination with the ways in which archaeological theory affects database design and digitisation strategies. I either work with contract archaeologists or on a post-processual site and the structural requirements are quite different, though both fundamentally rely on single context recording.

We have to face the fact that archaeological theory is quite simply no longer at the heart of archaeology, as it perhaps was from the 1960s until the end of the 1980s.

Instead we have seen over the last few decades an enormous expansion of commercial archaeology, now controlling far more funding than the Universities and responsible for the lion share of archaeological research. We may or may not like that fact and what it led to in terms of research results but commercial archaeology is undeniably today a far bigger player in the discipline than its poor sibling, University-based research.

Update from Catalhoyuk

In the absence of real updates, I thought I'd post some of my site diary entries. The power has been very dodgy the past few days so I might do a big catch up on email and whatever in Konya on Friday (our day off). Interestingly, Blogger has decided to present me the site in Turkish, presumably based on IP location, because the language settings on the browser are English-only. So if things go a bit strange it's because I can't really see what I'm doing.

19/07/2007
My first day on site this season. I feel like I've been pole-axed with tiredness after the trip out here, so I'm concentrating on catching up on Sarah's documentation [for her work on site this year] and generally remembering how everything works.

24/07/2007
Just had a random thought, though it's a shame I didn't think of it at the start so we could tell everyone who's been on site over the season – any blog posts, photos or videos, blah blah blah, could use the same tag (like 'catal07') on public content, so it's easier to find everything from this season regardless of where it's held.

[And now that I'm posting this on a blog I suppose I should do that myself]

Tuesday, July 24, 2007

I keep bouncing between looking at the Figurines and Ceramics databases.
I've been nabbing poor Chris whenever he comes anywhere near the computer room and asking questions about heat treatment and cores; he's been very patient.

We had a long meeting in the cafe on Sunday, and I re-jigged the recording structures afterwards. I think possibly last year's structure was too ambitious, given the time constraints on everyone – not only for building it, but for mapping data from old structures to the new and mostly for the time it takes to simply record the objects as it went into lots of technical detail that probably isn't sustainable at the moment.

With that in mind, I tipped the recording model on its head so that it's much more about observation than interpretation at this stage, particularly for colours and the various things that variations in colour indicate. For example, rather than breaking heat exposure down into manufacture, use, other events or post-deposition, for the moment it's enough to record that it's present. I've designed the forms to allow people to record the probable type of heat exposure (and how certain they are about it) if there's evidence to support it, but if there's no evidence either way they don't have to record a probable reason. The structures can be extended as we find out more about the raw materials around the site – I think they might change views on the intentionality represented by the presence of various inclusions.

I've spent the day reviewing the ceramics database structures with a view to normalising them, and also to fitting them into the shared clay recording system. It's a continuation of work from previous seasons but with the added pressure(?) challenge(?) that other teams will be using versions of the ceramics databases soon too, so it's really important to get the data structures right. Nurcan has been really helpful and her explanations of some of the changes have helped me think about the best solutions to her recording issues.

Journalists were out yesterday, we had our photo taken under the Catalhoyuk sign near the gate. Apparently it'll be in Wednesday's papers. I wonder if people in London could pick up copies in the off-licences around Green Lanes. 'Famous in Turkey' – sounds like a band name.

It's funny how the diary entries are starting to read like blog entries, and in a way they seem to be functioning a bit like a blog too, with people commenting on each other's entries. I almost feel like I should add a field so people can record which diary entry they're writing about alongside the units, etc, but would that be far too self-referential?

When the database goes back to London and is put on the web I think we should put an 'AddThis' button on the various diary, finds and excavation pages so people can add pages to social bookmarking sites, blogs, etc. If we sign up for an account we can see how it's used – I wonder how much activity that kind of 'passive' use would see compared to 'active' use like commenting on finds or excavation data. I really need to find out more about the barriers to participation for actively creating content. I'll suggest the 'AddThis' thing to Ian and Shahina if I get a chance.

I'm starting to really wish I'd had my hair cut before I left London because it's taken on a life of its own. Not that it really matters out here, I guess.

Some of the history of the Catalhoyuk database

I was going to post this on the Catalhoyuk blog but authentication isn't working right now. So, I'll post it here and move it over when it's working again.

Just in case you thought nothing happened during the off-season…

A lot of this information is contained in the Archive Reports but as the audience for those is probably more specialised than the average reader of this blog, I thought it might be interesting to talk about them here.

When MoLAS first became involved with the project, there were lots of isolated Microsoft Access 2000 databases for excavation, finds and specialist data. I could see that the original database design and structure was well structured and much valuable work had been done on the database previously. However, some problems had arisen over the years as the database grew and different specialists brought their own systems based on a mixture of applications and platforms.

It was difficult for specialist databases to use live field or excavation data because it wasn't available in a single central source. It had also become almost impossible to run queries across excavation seasons or areas, or produce multi-disciplinary analysis , as there were disparate unrelated databases for each area of study. Within many specialisms the data set has been broken up into many different files – for example, the excavation database was split into teams and some teams were creating separate files for different years.

In many cases, referential integrity was not properly enforced in the interface or database structure. While the original database structures included tables to supply lists of values to enable controlled vocabularies, the interfaces were using static rather than dynamic menus on data entry interfaces. Primary and/or foreign keys were not implemented in some databases, leading to the possibility of multiple entries, anomalous data or incorrect codes being recorded. There was little or no validation on data entry.

IBM generously donated two new servers, one for use on site and the other for the Cambridge office. This meant that we were able to install Microsoft SQL Server 2000 to use as a single backend database and start re-centralising the databases. This meant re-combining the disparate datasets into a single, central database, and reconfiguring the Access forms to use this new centralised backend.

Centralising and cleaning the data and interfaces was a bit of a slog (covered in more detail in the archive reports), and even now there are still bits and pieces to be done. I guess this shows the importance of proper database design and documentation, even when you think a project is only going to be small. I'm sure there was documentation originally, so I guess this also shows the importance of a good archiving system!

Unfortunately, because the 'business logic' of the database applications wasn't documented (if there was documentation it'd been lost over time) we couldn't re-do the existing forms in another application (like web forms) without losing all the validation and data entry rules that had been built up over time in response to the specialists' requirements. As usual in the world of archaeology, limited resources meant this wasn't possible at that stage. A lot of the application logic seemed to be held in the interfaces rather than in the relationships between tables, which meant a lot of data cleaning had to be done when centralising the databases and enforcing relationships.

As the 2004 Archive Report says, "The existing infrastructure was Microsoft Access based, and after consideration for minimal interruption to existing interfaces, and for the cost to the project of completely redeveloping the forms on another platform, these applications were retained."

Luckily, we're not tied to Access for new application development, and new and future database applications are created as HTML, eliminating any platform/OS compatibility issues.

This means that we can get on with more exciting things in the future! I'll post about some of those ideas soon.

In the meantime, check out the public version of the web interface to the Çatalhöyük database.

[Originally published on http://www.catalhoyuk.com/blog/, January 24, 2007]

Catalhoyuk diaries: What I did on my summer holidays

I was on site at Çatalhöyük for two weeks, and while I was there I contributed to the Catalhoyuk blog.

For me, it was a good opportunity to explain what I do on site – people are often confused about why a database developer would be going out to work on site in Turkey.

After that, I spent a few days in Istanbul, where I went to the Çatalhöyük exhibition and also to Istanbul Modern.

Then I caught a train to Bucharest, where I started my holiday. I visited Romania, Moldova, Transdniestr, Ukraine, and finally flew home from Krakow a month later.

Clay pipe recording at MoLAS and "Clay tobacco pipe makers' marks from London" website

[Update, September 2017: the site appears not to be supported by the Museum of London or MOLA, but there's an archived version at http://webarchive.nationalarchives.gov.uk/20090418203932/http://www.museumoflondon.org.uk/claypipes/index.asp that should provide access to equivalent pages of the links listed below.]

[Update, December 2011: if you're interested in clay pipes, you may be interested in Locating London's Past. The site also has an article that explains how Museum of London Archaeology (MoLA) Datasets – including clay pipes and glass – have been incorporated into the site.  NB: other than adding these links, I haven't updated the original 2006 paper below, so it doesn't include any enhancements made for this new work.  On a personal note, it's lovely to see that the sites, and the backend work behind them, still have value.]

Wheel symbol with pellets between the spokes, c 1610-40.

I'm just back from giving a paper at the Society for Clay Pipe Research Conference, held at the LAARC today. I thought I'd share the content of my paper online so that other people interested in digitising and publishing collections online could see how one particular project was implemented.

Some interesting feedback from the question session afterwards was that other archaeological units, museums or researchers might be interested in publishing records to the same site. In that case, I'd be happy to review the structures so they could be generalised (for other identifiers, for example) and publish them as an open standard along with more detailed information on the digitisation process.

Anyway, here's the text of the paper:

Clay pipe recording at MoLAS and the stamped makers’ mark website

 

Mia Ridge, Database Developer, Museum of London

 

SCPR ANNUAL CONFERENCE, September 16th 2006

 

London Archaeological Archive and Research Centre, Mortimer Wheeler House

 

Summary

The paper discusses the process from initial specification through requirements gathering, database design, development of the database application and website, to publication online.

Introduction

The project began with a proposal to create a database of clay tobacco pipe makers' marks from London:

"…a physical and digital database of clay tobacco pipe makers’ marks found in excavated contexts from London, dating to between c 1580 and 1910. This will encompass examples of makers’ marks, both stamped and moulded, on pipes made in London and imported from further afield, both in the UK and on the Continent. … The digital version of the database will be made available online, as part of the MoLAS website"

The work had two parts – enhancing the MoLAS Oracle database so that it could record more detailed information about the maker's marks; and creating a website to publish the marks and related images and information online.

Requirements gathering

'Requirements gathering' is the process of scoping and defining a project. The first step towards this is to define the internal and external stakeholders; the second is to determine their requirements. Internal stakeholder requirements include modified forms and structure for recording enhanced data and analysis, while external requirements relate to the publication of the data to defined groups of website users. It is important to define the targeted users of your website so that its content, site architecture and functionality can be tailored to them.

The targeted users of the site were largely determined by the subject matter. The main users will be specialists, followed by general adults. Site functionality, considered as search or browse capabilities, was determined as a balance between the purpose of the site, the needs of its visitors and the content and infrastructure we have available.

The database and website also had to be expandable to provide for greater temporal or geographic coverage, including collections throughout the Greater London area. Finally, in order to design data structures that would best meet the needs of the project, I had to consider nature of the material to be recorded.

The first discussions with Jacqui and Tony were about the requirements for the website. We then met to review the existing Oracle data structures and discuss the necessary changes. I asked lots of questions about how makers marks related to clay pipes – where, what kind and how many might appear on a pipe? It was important to understand how they varied, and which properties of position, type and method were significant, as well as to understand the exceptions. As you know, one 'IS' stamp is not necessarily the same as another 'IS' stamp – the trick is to enable to application to understand the difference. In this process, the aim is not to uncover the detail of the subject but to understand how its typologies are constructed.

Once the requirements have been determined, data structures were designed accordingly. These were presented to Jacqui and Tony, and reviewed in response to their feedback. Prototype forms were then designed to allow data entry, and the same process of feedback and modification followed. Significant changes were made after testing and further modifications were made as necessary during the implementation process as the practical implications of the modifications became clear.

One of the challenges of database design is balancing the benefits of recording in a more structured way, which provides for much greater flexibility in analysis, search and publication against a smaller learning curve and greater efficiency in data entry. For example, as different types of information are separated out of free text or general comments into more precise fields, the time required to record each entry increases.

As the data structures were finalised, queries were run to populate the modified structures with existing data, where possible.

Database design and development

The MoLAS Oracle database is used by our archaeologists and specialists to record field, find and environmental data. It has been developed in-house over many years and is one of the largest databases of its kind in the UK. As the database and forms are maintained in-house, we are able to modify it to meet our needs as required for projects or day-to-day business.

In the MoLAS database an individual pipe record must have a unique combination of sitecode, context, accession number and form that is different from any other pipe record. This unique identifier forms the basis of the database application. This combination of identifiers, called a 'primary key', can be used create links from a pipe with a particular mark to possible pipe makers. The sitecode can also be used to link to information about the particular excavation. Should a specialist desire, they can also link to other finds from the same context as well as related excavation and environmental data. The existing table structure was modified to support recording clay pipes and maker's marks in a more semantically structured way, with more detail and additional attributes.

The existing comments field was split into four new fields: general comments, maker comments, publication references, and parallels. I ran a report that listed all the existing comments so they could be manually reviewed and separated out into the relevant content areas.

A new field was added to mark pipe records that were to be published on the web. Other enhancements included new fields such as completeness, mould, manufacturing evidence, fabric, pipe length, links to photographs and illustrations, as well as a new numerical field, 'die' to allow the recording of individual dies known to be have been used by a single maker or workshop. The final new field was one that allows a particular pipe to be marked as containing the best example of a particular mark.

Some of the new fields required the creation of lists of values. These appear as drop-down menus on the data entry forms, and are used to make data entry faster and reduce errors. They are implemented as tables and can be designed so the values can be edited or added to as required.

When creating new fields, it's important to judge the effect on existing data, particularly in a project that can only selectively enhance records. As the project grant covered the enhancement of records for 120 marked clay pipes made between c 1580 and 1680, a small percentage of the entire dataset, many existing records would not have any data for the new fields. If it is not possible to go back and record the relevant information in the new field for each existing record, might that affect the validity of the data set as a whole? Will queries or searches return unexpected results if values aren't recorded consistently? Sometimes it is possible to apply a default value for existing records, or to mark previous records as 'not recorded'.

New tables were created to record information about known pipe makers. This includes their name, address, earliest and latest known dates, and free text including documentary evidence for this information. As this information is recorded in the database rather than in text files, it can be more easily searched and combined with related information and pipes for publication.

Additional tables were created to record the relationship between a mark on a particular pipe and a possible maker, including the probability of any pipe being related to a individual maker plus any publication references.

Content preparation

 

Enhancing database records

The basic process for enhancing stamped pipe records was:

    1. Add the webcode 'CoLAT' to the pipes that will be included in this project

 

    1. Add the photo number to the Photo number field

 

    1. Review and update the pipes entries

 

    1. Create the makers entries

 

  1. Add pipes to the sub-form on the makers form to create the link between pipe and definite/probable/possible maker

Two queries were created to help monitor progress and give an idea of how the data would look on the website. One was a report showing which records have been successfully marked up for export to the website. The other showed how the links between makers and pipes would be displayed and could be used to check the success of a link created between a mark and possible maker.

Other content preparation

While the technical database and website development and specialist recording work was underway, Jacqui had organised for the MoLAS photographer to take photos of the marks that were going to appear on the website. The photo number was then recorded in the database. The scripts that generate the website use this to link the right image to the right pipe and mark for display on the web page. Jacqui also wrote text for inclusion on the site and definitions of codes used in the database were created, to make the published records more user-friendly and clear to non-specialists.

The website

The address of the website, 'Clay tobacco pipe makers' marks from London' is http://www.museumoflondon.org.uk/claypipes/
It is held as a 'collections microsite' within the Museum of London website structure.

The front page

The front page is designed to provide direct access to the data while contextualising the content, making the current scope of the project and the goals of the site immediately clear. It also allows us to thank our funders.

The design of the website was based on templates developed for the LAARC site. The front page introduces the navigation and title banner, which remain consistent throughout the site. There are three immediately clear 'calls to action' for the user on the front page: browse maker's marks, browse makers, and search for marks.

Browse maker's marks

http://www.museumoflondon.org.uk/claypipes/pages/marks.asp

In this section of the site, you can view thumbnails of the best example of each mark. The initials or description of the mark are listed with each thumbnail. This means you can search the text of the page for a particular mark, and also aids accessibility and helps search engines index the site, while still being visually appealing.

View mark

From the list of marks, you can access the page for a particular mark. Where appropriate, this page contains a more detailed description of the mark; images of all the pipes with that mark plus the sitecode, excavation context number and bowl form for that pipe. It also displays the dates associated with that form and the die number for each pipe on the page. Each image of a mark is also a link to the particular pipe page.

View pipe

Each pipe page contains the initials or description of its makers mark, a description of the pipe, its burnishing and milling as well as information about the excavation in which the pipe was found. This includes the address, easting and northing of the site. It would be possible to link to the full site record in LAARC, particularly when the LAARC site has been redeveloped. The page also lists any possible or known makers and the certainty of their being the maker of that pipe. The name of each maker is a link to the maker.

View maker

This page displays the name, address, earliest and latest dates plus any additional commentary and publication references for the maker. It also lists each pipe they might have made with the probability of their being the maker.

Browse makers

http://www.museumoflondon.org.uk/claypipes/pages/makers.asp
This page displays a list of all makers on the site. The name of each maker is a link to the full information about that maker, as above.

Search

http://www.museumoflondon.org.uk/claypipes/pages/search.asp
The search for this site is fairly simple, but the functionality can be expanded if necessary.
It searches the description field for a match to the search term.

About the project

http://www.museumoflondon.org.uk/claypipes/pages/about.asp

This section contains excellent information for general visitors and specialists about the project, why clay pipes are important for archaeologists, the clay pipes of London as well as a glossary and references. These pages contextualise the study of clay pipes, enriching the general visitor's experience and providing specific information about the background to tobacco pipe makers’ marks found in excavated contexts from London for specialist users.

From the database to the website

The first step in publishing the enhanced content online is getting data from the internal database to the web server. The web server is the computer that sends out the pages when a visitor clicks on the link.

SQL scripts run on the MoLAS server extract data from the MoLAS Oracle database and other sources, combining them into a form suitable for publication on the website. This data is stored in tables on the web server database. Information about the archaeological sites is drawn from data published through the London Archaeological Archive Resource Centre (LAARC) web site. This data is linked throughthe sitecode on the pipe or mark database record.

The scripts also combine information that has been stored separately into the publication format according to the relationships defined in the database. For example, they may bring together a pipe with possible makers through the pipe marks recorded. Where necessary, the database extraction scripts also extract the code definitions from List of Value tables. These translate a value like the mark position code 'BR' into the more user-friendly 'on the bowl, on right side as smoked'.

The website is generated by another collection of scripts, using a web scripting language called ASP. These scripts can be thought of as templates that contain placeholders for different types of information or images. When a page is requested, the script runs and fills the appropriate information in the appropriate part of the template.

Because these templates dynamically generate the pages, the design and the content are separated. This means the site can be expanded as new records are added to the database. Updated or new content can appear on the site instantly, without waiting for IT resources to be available. It's also easy to update the templates so that new fields can be viewed, new links generated or additional search parameters added. The graphic design (the 'look and feel' of the site) or the site navigation can be updated in a single script and the change is immediately visible across all site pages.

The 'about' section of the website also contains 'static' pages. These pages do not change when the content of the database changes. However, they use the same scripts to generate the design and navigation so can easily be changed as necessary.

One of the requirements for the site was that it was accessible to search engines. As the project did not have a budget for marketing the site, search engines were going to be the main source of website visitors. The site was designed using 'semantic markup' which not only helps search engines understand the structure of the site, but also aids accessibility for people with disabilities.

Save

Catalhoyuk diaries: August 2006

It's hard to believe my time here has flown so quickly. I’m sad to be leaving on Friday, and I feel like there’s not enough time to get everything I want to done before I go. The Istanbul team left today and some specialists are starting to leave, so the site already feels a little like it’s winding down for the season.

This season I've mostly been concentrating on working through recording structures and developing the interface for a unified Clay Objects database application.

I've been talking about my ideas this application for the past two years so it's great to finally get a chance to make it real. I've had lots of intense discussions with pottery, figurines and building materials specialists as well as a new member of the team who is looking at the changes in the fabric of the site across artefact types and excavation processes.

Working with someone who'll actually be benefiting from this unified approach to recording, and re-examining existing recording so that as far as possible clay fabrics or matrices are recorded according to diagnostic evidence (particularly someone with a background in geology who is able to bring a lot of technical expertise to the project) has been a real advantage.

I've also been working with the Human Remains team on their database, but they've made my job easy by developing their own forms based on the data structures I sent from London. They've been really inspiring – it's one thing to tell people that they have the power to create their own interfaces and queries, but it's so much better to see people actually do it.

In between that (and sometimes the development happens in-between solving other problems) I've been dealing with smaller fixes to existing databases, dealing with network issues and all kinds of things that you come up when people ask you any computer-related questions.

My brain has been working overtime; there are so many new ways of interrogating the databases now that everything is becoming integrated that the possibilities seem endless. We’ve had a few seminars, and people presenting their PhD research, and I come out with ideas for new improvements every time. It’s also been fascinating hearing how their data translates into a picture of life lived on the mound.

The model of creating 'new' database applications by combining existing data across specialisms with new interpretation and specialist recording will be the basis for the architectural and beads databases, and I'm already excited by that. One idea that fascinates me is the idea of recording things like that the wall paintings in the database so they can be linked to other representational artefacts like figurines and stamp seals. A 'representational database' could look at similarities and differences in images across all and any materials or artefact types. Do wall paintings show the same kinds of artistic, personal or cultural concerns as the figurines? Do certain types of features occur with certain kinds of representational artifacts?

Anyway, I don’t want to hog the internet computer, so it’s back to work for me.

[Originally published on http://www.catalhoyuk.com/blog/, August 2, 2006]

Catalhoyuk diaries: Settling in

It's my first proper working day on site this season and I'm slowly working my way through Sarah's documentation of the database work and general IT issues she's encountered while she was here. At this stage, I only have one major 'new' application to work on, and in large part that's thanks to Sarah's hard work over the past months, both on- and off-site.

I'm hoping that now that the hard grunt work of centralising, bug fixing, cleaning and consolidating the existing databases over the past few years is (mostly) over, and the applications I created in previous years are bedded in, I'll have a real chance to think about what else we can do with all this data. I was so busy before I left London that I hadn't really had a chance to get excited about coming back to Catalhoyuk but as soon as I was on my way I realised that this could be an immensely intellectually rich and rewarding two weeks.

There's always so much new technology, I'm sure there's a knack to not getting carried away by every new possibility. But I can't help but wonder what would happen if we recreated Catalhoyuk in Second Life or another 3D world. Imagine re-populating the mound with a living community of real people!

I'd love to see how we could use semantic web/Web 2.0 technologies to open up our data to the rest of the world. I'm interested in the tagging technologies emerging through folksonomies like steve.museum, and wonder if we could apply them to the finds data we publish on the web.

I've realised that you could almost think of the excavation diary entries as blog posts, in which case Catalhoyuk has a blog that goes back to 1997.

[Originally published on http://www.catalhoyuk.com/blog/, July 22, 2006]