Archaeology – Open Objects

Catalhoyuk question

This will only be relevant to the archaeologists, I guess, but it has occurred to me to ask – what would you like to see in the Catalhoyuk archive reports? What information would either be useful or satisfy your curiosity?

In a wider sense, what can we (as IT geeks in the cultural heritage sector) learn from each other? What are we too scared to ask in case it's a stupid question, or because it seems too obscure? What don't we share because we assume that everyone else knows it already?

Final diary entry from Catalhoyuk

I'm back in London now but here goes anyway:

August 1
My final entry of the season as I'm on the overnight train from Cumra to Istanbul tonight. After various conversations on the veranda I've been thinking about the intellectual accessibility of our Catalhoyuk data and how that relates to web publication and this entry is just a good way to stop these thoughts running round my head like a rogue tune.

[This has turned into a long entry, and I don't say anything trivial about the weather or other random things so you'd have to be pretty bored to read it all. Shouldn't you be working instead?]

Getting database records up on the website isn't hard – it's just a matter of resources. The tricky part is providing an interesting and engaging experience for the general visitor, or a reliable, useable and useful data set for the specialist visitor.

At the moment it feels like a lot of good content is hidden within the database section of the website. When you get down to browsing lists of features, there's often enough information in the first few lines to catch your interest. But when you get to lists of units, even the pages with some of the unit description presented alongside the list, you start to encounter the '800 lamps' problem.

[A digression/explanation – I'm working on a website at the Museum of London with a searchable/browsable catalogue of objects from Roman Londinium. One section has 800 Roman oil lamps – how on earth can you present that to the user so they can usefully distinguish between one lamp and another?]

Of course, it does depend on the kind of user and what they want to achieve on the Londinium site – for some, it's enough to read one nicely written piece on the use of lamps and maybe a bit about what different types meant, all illustrated with a few choice objects; specialist users may want to search for lamps with very particular characteristics. Here, our '800 lamps' are 11846 (and counting) units of archaeology. The average user isn't going to read every unit sheet, but how can they even choose one to start with? And how many will know how to interpret and create meaning from what they read about the varying shades of brown stuff? Being able to download unit sheets that match a particular pattern – part of a certain building, ones that contain certain types of finds, units related to different kinds of features – is probably of real benefit to specialist visitors, but are we really giving those specialist visitors (professional or amateur) and our general visitors what they need? I'm not sure a raw huge list of units or flotation numbers is of much help to anyone – how do people distinguish between one thumbnail of a lamp or one unit number and another in a useful and meaningful way? I hope this doesn't sound like a criticism of the website – it's just the nature of the material being presented.

The variability of the data is another problem – it's not just about data cleaning (though the 'view features by type' page shows why data cleaning is useful) – but about the difference between the beautiful page for Building 49 and rather less interesting page for Building 33 (to pick one at random). If a user lands on one of the pages with minimal information they may never realise that some pages have detailed records with fantastic plans and photos.

So there are the barriers to entry that we might accidentally perpetuate by 'hiding' the content behind lists of numbers; and there is the general intellectual accessibility of the information to the general user. Given limited resources, where should our energies be concentrated? Who are our websites for?

It's also about matching the data and website functionality to the user and their goals – the excavation database might not be of interest to the general user in its most raw form, and that's ok because it will be of great interest to others. At a guess, the general public might be more interested in finds, and if that's the case we should find ways to present information about the objects with appropriate interpretation and contextualisation, not only to present information about the objects but also to help people have a more meaningful experience on the site.

I wonder if 'team favourite' finds or buildings/spaces/features could be a good way into the data, a solution that doesn't mean making some kinds of finds or some buildings into 'treasure' and more important than others. Or perhaps specialists could talk about a unit or feature they find interesting – along the way they could explain how their specialism contributes to the archaeological record (written as if to an intelligent thirteen year old). For example, Flip could talk about phytoliths, or Stringy could talk about obsidian, and what their finds can tell us.

Proper user evaluation would be fabulous, but in the absence of resources, I really should look at the stats and see how the site is used. I wonder if I could do a surveymonkey thing to get general information from different types of users? I wonder what levels of knowledge our visitors have about the Neolithic, about Anatolian history, etc. What brings them to the website? And what makes them stick around?

Intellectual accessibility doesn't only apply to the general public – it also applies to the accessibility of other team's or labs content. There are so many tables hidden behind the excavation and specialist database interfaces – some are archived, some had a very particular usage, some are still actively used but still have the names of long-gone databases. It's all very well encouraging people to use the database to query across specialisms, but how will they know where to look for the data they need? [And if we make documentation, will anyone read it?]

It was quite cool this morning but now it's hot again. Ha, I lied about not saying anything trivial about the weather! Now go do some work.
(Started July 29, but finally posted August 1)

Catalhoyuk diaries: In the absence of real world content…

…more diary entries from Catalhoyuk.
July 24, started late at night in tent. [but posted as July 25, 2007]
Spent some time whizzing things around in ArcGIS the past few days. I never get to play with GIS at work so it's quite fun. I need to talk to Cord and Dave about what views/tables they need in the database to link in the excavation and finds data. It's a shame we didn't get to experiment with bringing data across before Cord left but hopefully Dave and I can do a 'proof of concept' that she can play with when she gets back to site. Maybe skellies or X-finds, or just basic unit information as a first go.

Today was going to be a solid day of programming, but the power was out for three hours last night and two hours of lab time this morning, so I'm still catching up on the stuff I was going to finish last night.

Last night I wrote myself a note from the geek perspective about "I think the challenge of Catal is combining the reflexive, the uncertain, the indeterminate, with the rigorous requirements of structured recording in a database; and perhaps more importantly, convincing people that it's possible to design to allow for uncertainty and for multivocality" but in the light of day that sounds like pretentious tosh that could only have been thought up in the middle of the night. Well done me.

July 30, middle of the day.
It's really been quite hot, though just now a change is coming through and it's getting cooler. I'm such a (lab, not southern) jessie, I can't imagine what it's like working up on the mound – apparently it's been 48C in the south shelter. The feel of the coming storm reminds me of Melbourne but I bet the heat won't break after the storm like it would at home. I hope my borrowed tent doesn't blow away.

Very frustrated by the power cuts. I feel like I'm losing lots of work time to them. My list of things to do is getting longer and my time is getting shorter. I wish it was the other way around.

I've been documenting some of the database tables in preparation for an informal tutorial on database querying tomorrow. I say 'informal' but that's really just cos I don't have time to prepare so I'll wing it. Hopefully people will have some good examples. I think it'll also point out where we need to make improvements – putting all the relationships into the central database is probably the first thing to do, so that they automatically come through into the AllTables database. This will make it a lot easier for people to join tables as some joins will be created auto-magically for them. It's probably not worth documenting all the tables at the field level, but there are some (where the data type is different between old databases, for example) where it would be useful. The descriptions could also serve as synonyms to help people find the tables they need for their queries.

Update from Catalhoyuk

In the absence of real updates, I thought I'd post some of my site diary entries. The power has been very dodgy the past few days so I might do a big catch up on email and whatever in Konya on Friday (our day off). Interestingly, Blogger has decided to present me the site in Turkish, presumably based on IP location, because the language settings on the browser are English-only. So if things go a bit strange it's because I can't really see what I'm doing.

19/07/2007
My first day on site this season. I feel like I've been pole-axed with tiredness after the trip out here, so I'm concentrating on catching up on Sarah's documentation [for her work on site this year] and generally remembering how everything works.

24/07/2007
Just had a random thought, though it's a shame I didn't think of it at the start so we could tell everyone who's been on site over the season – any blog posts, photos or videos, blah blah blah, could use the same tag (like 'catal07') on public content, so it's easier to find everything from this season regardless of where it's held.

[And now that I'm posting this on a blog I suppose I should do that myself]

Tuesday, July 24, 2007

I keep bouncing between looking at the Figurines and Ceramics databases.
I've been nabbing poor Chris whenever he comes anywhere near the computer room and asking questions about heat treatment and cores; he's been very patient.

We had a long meeting in the cafe on Sunday, and I re-jigged the recording structures afterwards. I think possibly last year's structure was too ambitious, given the time constraints on everyone – not only for building it, but for mapping data from old structures to the new and mostly for the time it takes to simply record the objects as it went into lots of technical detail that probably isn't sustainable at the moment.

With that in mind, I tipped the recording model on its head so that it's much more about observation than interpretation at this stage, particularly for colours and the various things that variations in colour indicate. For example, rather than breaking heat exposure down into manufacture, use, other events or post-deposition, for the moment it's enough to record that it's present. I've designed the forms to allow people to record the probable type of heat exposure (and how certain they are about it) if there's evidence to support it, but if there's no evidence either way they don't have to record a probable reason. The structures can be extended as we find out more about the raw materials around the site – I think they might change views on the intentionality represented by the presence of various inclusions.

I've spent the day reviewing the ceramics database structures with a view to normalising them, and also to fitting them into the shared clay recording system. It's a continuation of work from previous seasons but with the added pressure(?) challenge(?) that other teams will be using versions of the ceramics databases soon too, so it's really important to get the data structures right. Nurcan has been really helpful and her explanations of some of the changes have helped me think about the best solutions to her recording issues.

Journalists were out yesterday, we had our photo taken under the Catalhoyuk sign near the gate. Apparently it'll be in Wednesday's papers. I wonder if people in London could pick up copies in the off-licences around Green Lanes. 'Famous in Turkey' – sounds like a band name.

It's funny how the diary entries are starting to read like blog entries, and in a way they seem to be functioning a bit like a blog too, with people commenting on each other's entries. I almost feel like I should add a field so people can record which diary entry they're writing about alongside the units, etc, but would that be far too self-referential?

When the database goes back to London and is put on the web I think we should put an 'AddThis' button on the various diary, finds and excavation pages so people can add pages to social bookmarking sites, blogs, etc. If we sign up for an account we can see how it's used – I wonder how much activity that kind of 'passive' use would see compared to 'active' use like commenting on finds or excavation data. I really need to find out more about the barriers to participation for actively creating content. I'll suggest the 'AddThis' thing to Ian and Shahina if I get a chance.

I'm starting to really wish I'd had my hair cut before I left London because it's taken on a life of its own. Not that it really matters out here, I guess.

I'm off to Catalhoyuk for two weeks

I'm off to Çatalhöyük, Turkey, for two weeks, for the usual database analysis/design/development. I don't imagine I'll have any time to post updates, so you can make do with photos from previous years in the meantime.

www.flickr.com

Catalhoyuk 2004, 2005, 2006 photoset

Some of the history of the Catalhoyuk database

I was going to post this on the Catalhoyuk blog but authentication isn't working right now. So, I'll post it here and move it over when it's working again.

Just in case you thought nothing happened during the off-season…

A lot of this information is contained in the Archive Reports but as the audience for those is probably more specialised than the average reader of this blog, I thought it might be interesting to talk about them here.

When MoLAS first became involved with the project, there were lots of isolated Microsoft Access 2000 databases for excavation, finds and specialist data. I could see that the original database design and structure was well structured and much valuable work had been done on the database previously. However, some problems had arisen over the years as the database grew and different specialists brought their own systems based on a mixture of applications and platforms.

It was difficult for specialist databases to use live field or excavation data because it wasn't available in a single central source. It had also become almost impossible to run queries across excavation seasons or areas, or produce multi-disciplinary analysis , as there were disparate unrelated databases for each area of study. Within many specialisms the data set has been broken up into many different files – for example, the excavation database was split into teams and some teams were creating separate files for different years.

In many cases, referential integrity was not properly enforced in the interface or database structure. While the original database structures included tables to supply lists of values to enable controlled vocabularies, the interfaces were using static rather than dynamic menus on data entry interfaces. Primary and/or foreign keys were not implemented in some databases, leading to the possibility of multiple entries, anomalous data or incorrect codes being recorded. There was little or no validation on data entry.

IBM generously donated two new servers, one for use on site and the other for the Cambridge office. This meant that we were able to install Microsoft SQL Server 2000 to use as a single backend database and start re-centralising the databases. This meant re-combining the disparate datasets into a single, central database, and reconfiguring the Access forms to use this new centralised backend.

Centralising and cleaning the data and interfaces was a bit of a slog (covered in more detail in the archive reports), and even now there are still bits and pieces to be done. I guess this shows the importance of proper database design and documentation, even when you think a project is only going to be small. I'm sure there was documentation originally, so I guess this also shows the importance of a good archiving system!

Unfortunately, because the 'business logic' of the database applications wasn't documented (if there was documentation it'd been lost over time) we couldn't re-do the existing forms in another application (like web forms) without losing all the validation and data entry rules that had been built up over time in response to the specialists' requirements. As usual in the world of archaeology, limited resources meant this wasn't possible at that stage. A lot of the application logic seemed to be held in the interfaces rather than in the relationships between tables, which meant a lot of data cleaning had to be done when centralising the databases and enforcing relationships.

As the 2004 Archive Report says, "The existing infrastructure was Microsoft Access based, and after consideration for minimal interruption to existing interfaces, and for the cost to the project of completely redeveloping the forms on another platform, these applications were retained."

Luckily, we're not tied to Access for new application development, and new and future database applications are created as HTML, eliminating any platform/OS compatibility issues.

This means that we can get on with more exciting things in the future! I'll post about some of those ideas soon.

In the meantime, check out the public version of the web interface to the Çatalhöyük database.

[Originally published on http://www.catalhoyuk.com/blog/, January 24, 2007]

Catalhoyuk diaries: What I did on my summer holidays

I was on site at Çatalhöyük for two weeks, and while I was there I contributed to the Catalhoyuk blog.

For me, it was a good opportunity to explain what I do on site – people are often confused about why a database developer would be going out to work on site in Turkey.

After that, I spent a few days in Istanbul, where I went to the Çatalhöyük exhibition and also to Istanbul Modern.

Then I caught a train to Bucharest, where I started my holiday. I visited Romania, Moldova, Transdniestr, Ukraine, and finally flew home from Krakow a month later.

Catalhoyuk diaries: August 2006

It's hard to believe my time here has flown so quickly. I’m sad to be leaving on Friday, and I feel like there’s not enough time to get everything I want to done before I go. The Istanbul team left today and some specialists are starting to leave, so the site already feels a little like it’s winding down for the season.

This season I've mostly been concentrating on working through recording structures and developing the interface for a unified Clay Objects database application.

I've been talking about my ideas this application for the past two years so it's great to finally get a chance to make it real. I've had lots of intense discussions with pottery, figurines and building materials specialists as well as a new member of the team who is looking at the changes in the fabric of the site across artefact types and excavation processes.

Working with someone who'll actually be benefiting from this unified approach to recording, and re-examining existing recording so that as far as possible clay fabrics or matrices are recorded according to diagnostic evidence (particularly someone with a background in geology who is able to bring a lot of technical expertise to the project) has been a real advantage.

I've also been working with the Human Remains team on their database, but they've made my job easy by developing their own forms based on the data structures I sent from London. They've been really inspiring – it's one thing to tell people that they have the power to create their own interfaces and queries, but it's so much better to see people actually do it.

In between that (and sometimes the development happens in-between solving other problems) I've been dealing with smaller fixes to existing databases, dealing with network issues and all kinds of things that you come up when people ask you any computer-related questions.

My brain has been working overtime; there are so many new ways of interrogating the databases now that everything is becoming integrated that the possibilities seem endless. We’ve had a few seminars, and people presenting their PhD research, and I come out with ideas for new improvements every time. It’s also been fascinating hearing how their data translates into a picture of life lived on the mound.

The model of creating 'new' database applications by combining existing data across specialisms with new interpretation and specialist recording will be the basis for the architectural and beads databases, and I'm already excited by that. One idea that fascinates me is the idea of recording things like that the wall paintings in the database so they can be linked to other representational artefacts like figurines and stamp seals. A 'representational database' could look at similarities and differences in images across all and any materials or artefact types. Do wall paintings show the same kinds of artistic, personal or cultural concerns as the figurines? Do certain types of features occur with certain kinds of representational artifacts?

Anyway, I don’t want to hog the internet computer, so it’s back to work for me.

[Originally published on http://www.catalhoyuk.com/blog/, August 2, 2006]

Catalhoyuk diaries: Settling in

It's my first proper working day on site this season and I'm slowly working my way through Sarah's documentation of the database work and general IT issues she's encountered while she was here. At this stage, I only have one major 'new' application to work on, and in large part that's thanks to Sarah's hard work over the past months, both on- and off-site.

I'm hoping that now that the hard grunt work of centralising, bug fixing, cleaning and consolidating the existing databases over the past few years is (mostly) over, and the applications I created in previous years are bedded in, I'll have a real chance to think about what else we can do with all this data. I was so busy before I left London that I hadn't really had a chance to get excited about coming back to Catalhoyuk but as soon as I was on my way I realised that this could be an immensely intellectually rich and rewarding two weeks.

There's always so much new technology, I'm sure there's a knack to not getting carried away by every new possibility. But I can't help but wonder what would happen if we recreated Catalhoyuk in Second Life or another 3D world. Imagine re-populating the mound with a living community of real people!

I'd love to see how we could use semantic web/Web 2.0 technologies to open up our data to the rest of the world. I'm interested in the tagging technologies emerging through folksonomies like steve.museum, and wonder if we could apply them to the finds data we publish on the web.

I've realised that you could almost think of the excavation diary entries as blog posts, in which case Catalhoyuk has a blog that goes back to 1997.

[Originally published on http://www.catalhoyuk.com/blog/, July 22, 2006]