‘…and they all turn on their computers and say ‘yay!” (aka, ‘mapping for humanists’)

I’m spending a few hours of my Sunday experimenting with ‘mapping for humanists’ with an art historian friend, Hannah Williams (@_hannahwill).  We’re going to have a go at solving some issues she has encountered when geo-coding addresses in 17th and 18th Century Paris, and we’ll post as we go to record the process and hopefully share some useful reflections on what we found as we tried different tools.

We started by working out what issues we wanted to address.  After some discussion we boiled it down to two basic goals: a) to geo-reference historical maps so they can be used to geo-locate addresses and b) to generate maps dynamically from list of addresses. This also means dealing with copyright and licensing issues along the way and thinking about how geospatial tools might fit into the everyday working practices of a historian.  (i.e. while a tool like Google Refine can generate easily generate maps, is it usable for people who are more comfortable with Word than relying on cloud-based services like Google Docs?  And if copyright is a concern, is it as easy to put points on an OpenStreetMap as on a Google Map?)

Like many historians, Hannah’s use of maps fell into two main areas: maps as illustrations, and maps as analytic tools.  Maps used for illustrations (e.g. in publications) are ideally copyright-free, or can at least be used as illustrative screenshots.  Interactivity is a lower priority for now as the dataset would be private until the scholarly publication is complete (owing to concerns about the lack of an established etiquette and format for citation and credit for online projects).

Maps used for analysis would ideally support layers of geo-referenced historic maps on top of modern map services, allowing historic addresses to be visually located via contemporaneous maps and geo-located via the link to the modern map.  Hannah has been experimenting with finding location data via old maps of Paris in Hypercities, but manually locating 18th Century streets on historic maps then matching those locations to modern maps is time-consuming and she suspects there are more efficient ways to map old addresses onto modern Paris.

Based on my research interviews with historians and my own experience as a programmer, I’d also like to help humanists generate maps directly from structured data (and ideally to store their data in user-friendly tools so that it’s as easy to re-use as it is to create and edit).  I’m not sure if it’s possible to do this from existing tools or whether they’d always need an export step, so one of my questions is whether there are easy ways to get records stored in something like Word or Excel into an online tool and create maps from there.  Some other issues historians face in using mapping include: imprecise locations (e.g. street names without house numbers); potential changes in street layouts between historic and modern maps; incomplete datasets; using markers to visually differentiate types of information on maps; and retaining descriptive location data and other contextual information.

Because the challenge is to help the average humanist, I’ve assumed we should stay away from software that needs to be installed on a server, so to start with we’re trying some of the web-based geo-referencing tools listed at http://help.oldmapsonline.org/georeference.

Geo-referencing tools for non-technical people

The first bump in the road was finding maps that are re-usable in technical and licensing terms so that we could link or upload them to the web tools listed at http://help.oldmapsonline.org/georeference.  We’ve fudged it for now by using a screenshot to try out the tools, but it’s not exactly a sustainable solution.  
Hannah’s been trying georeferencer.org, Hypercities and Heurist (thanks to Lise Summers ‏@morethangrass on twitter) and has written up her findings at Hacking Historical Maps… or trying to.  Thanks also to Alex Butterworth @AlxButterworth and Joseph Reeves @iknowjoseph for suggestions during the day.

Yahoo! Mapmixer’s page was a 404 – I couldn’t find any reference to the service being closed, but I also couldn’t find a current link for it.

Next I tried Metacarter Labs’ Map Rectifier.  Any maps uploaded to this service are publicly visible, though the site says this does ‘not grant a copyright license to other users’, ‘[t]here is no expectation of privacy or protection of data’, which may be a concern for academics negotiating the line between openness and protecting work-in-progress or anyone dealing with sensitive data.  Many of the historians I’ve interviewed for my PhD research feel that some sense of control over who can view and use their data is important, though the reasons why and how this is manifested vary.

Screenshot from http://labs.metacarta.com/rectifier/rectify/7192


The site has clear instructions – ‘double click on the source map… Double click on the right side to associate that point with the reference map’ but the search within the right-hand side ‘source map’ didn’t work and manually navigating to Paris, then the right section of Paris was a huge pain.  Neither of the base maps seemed to have labels, so finding the right location at the right level of zoom was too hard and eventually I gave up.  Maybe the service isn’t meant to deal with that level of zoom?  We were using a very small section of map for our trials.

Inspired by Metacarta’s Map Rectifier, Map Warper was written with OpenStreetMap in mind, which immediately helps us get closer to the goal of images usable in publications.  Map Warper is also used by the New York Public Library, which described it as a ‘tool for digitally aligning (“rectifying”) historical maps … to match today’s precise maps’.  Map Warper also makes all uploaded maps public: ‘By uploading images to the website, you agree that you have permission to do so, and accept that anyone else can potentially view and use them, including changing control points’, but also offers ‘Map visibility’ options ‘Public(default)’ and ‘Don’t list the map (only you can see it)’.

Screenshot showing ‘warped’ historical map overlaid on OpenStreetMap at http://mapwarper.net/

Once a map is uploaded, it zooms to a ‘best guess’ location, presumably based on the information you provided when uploading the image.  It’s a powerful tool, though I suspect it works better with larger images with more room for error.  Some of the functionality is a little obscure to the casual user – for example, the ‘Rectify’ view tells me ‘[t]his map either is not currently masked. Do you want to add or edit a mask now?’ without explaining what a mask is.  However, I can live with some roughness around the edges because once you’ve warped your map (i.e. aligned it with a modern map), there’s a handy link on the Export tab, ‘View KML in Google Maps’ that takes you to your map overlaid on a modern map.  Success!

Sadly not all the export options seem to be complete (they weren’t working on my map, anyway) so I couldn’t work out if there was a non-geek friendly way to open the map in OpenStreetMap.

We have to stop here for now, but at this point we’ve met one of the goals – to geo-reference historical maps so locations from the past can be found in the present, but the other will have to wait for another day.  (But I’d probably start with openheatmap.com when we tackle it again.  Any other suggestions would be gratefully received!)

(The title quote is something I heard one non-geek friend say to another to explain what geeks get up to at hackdays. We called our experiment a ‘hackday’ because we were curious to see whether the format of a hackday – working to meet a challenge within set parameters within a short period of time – would work for other types of projects. While this ended up being almost an ‘anti-hack’, because I didn’t want to write code unless we came across a need for a generic tool, the format worked quite well for getting us to concentrate solidly on a small set of problems for an afternoon.)

Quick and dirty Digital Humanities Australasia notes: day 2

What better way to fill in stopover time in Abu Dhabi than continuing to post my notes from DHA2012? [Though I finished off the post and re-posted once I was back home.] These are my very rough notes from day 2 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Slow and still dirty Digital Humanities Australasia notes: day 3). In the interests of speed I’ll share my notes and worry about my own interpretations later.

Keynote panel, ‘Big Digital Humanities?’

Day 2 was introduced by Craig Bellamy, and began with a keynote panel with Peter Robinson, Harold Short and John Unsworth, chaired by Hugh Craig. [See also Snurb’s liveblogs for Robinson, Short and Unsworth.] Robinson asked ‘what constitutes success for the digital humanities?’ and further, what does the visible successes of digital humanities mask? He said it’s harder for scholars to do high quality research with digital methods now than it was 20 years ago. But the answer isn’t more digital humanists, it’s having the ingredients to allow anyone to build bridges… He called for a new generation of tools and methods to support the scholarship that people want to do: ‘It should be as easy to make a digital edition (of a document/book) as it is to make a Facebook page’, it shouldn’t require collaboration with a digital humanist. To allow data made by one person to be made available to others, all digital scholarship should be made available under a Creative Commons licence (publishers can’t publish it now if it’s under a non-commercial licence), and digital humanities data should be structured and enriched with metadata and made available for re-use with other tools. The model for sustainability depends on anyone and everyone being able to access data.

Harold Short talked about big (or at least unescapable) data and the ‘Svensson challenge’ – rather than trying to work out how to take advantage of infrastructure created by and for the sciences, use your imagination to figure out what’s needed for the arts and humanities. He called for a focus on infrastructure and content rather than ‘data’.

John Unsworth reminded us that digital humanities is a certain kind of work in the humanities that uses computational methods as its research methods. It’s not just using digital materials, though it does require large collections of data – it also requires a sense of how how the tools work.

What is the digital humanities?

Very different versions of ‘digital humanities’ emerged through the panel and subsequent discussion, leaving me wondering how they related to the different evolutionary paths of digital history and digital literature studies mentioned the day before. Meanwhile, on the back channel (from the tweets that are to hand), I wondered if a two-tier model of digital humanities was emerging – one that uses traditional methods with digital content (DH lite?); another that disrupts traditional methods and values. Though thinking about it now, the ‘tsunami’ of data mentioned is disruptive in its own right, regardless of the intentional choices one makes about research practices (which might have been what Alan Liu meant when he asked about ‘seamless’ and ‘seamful’ views of the world)…. On twitter, other people (@mikejonesmelb, @bestqualitycrab, @1n9r1d) wondered if the panel’s interpretation of ‘big’ data was gendered, generational, sectoral, or any other combination of factors (including as the messiness and variability of historical data compared to literature) and whether it could have been about ‘disciplinary breadth and inclusiveness‘ rather than scale.

Data morning session

The first speaker was Toby Burrows on ‘Using Linked Data to Build Large‐Scale e‐Research Environments for the Humanities’. [Update: he’s shared his slides and paper online and see also Snurb’s liveblog.] Continuing some of the themes from the morning keynote panel, he said that the humanities has already been washed away in the digital deluge, the proliferation of digital stuff is beyond the capacity of individual researchers. It’s difficult to answer complex humanities questions only using search with this ‘industrialised’ humanities data, but large-scale digital libraries and collections offer very little support for functions other than search. There’s very little connection between data that researchers are amassing and what institutions are amassing.

He’s also been looking at historians/humanists research practices [and selfishly I was glad to see many parallels with my own early findings]. The tools may be digital rather than paper and scissors, but historians are still annotating and excerpting as they always have. The ‘sharing’ part of their work has changed the most – it’s easier to share, and they can share at an earlier stage if they choose to do that, but not a lot has changed at the personal level.

Burrows said applying applying linked data approach to manuscript research would go a long way to addressing the complexity of the field. For example, using global URIs for manuscripts and parts; separating names and concepts from descriptive information; and using linked data functions to relate scholarly activities (annotations, excerpts, representations etc) to manuscript descriptions, objects and publications. Linked data can provide a layer of entities that sits between research activities and descriptions/collections/publications, which avoids conflating the entities and the source material. Multiple naming schemes are necessary for describing entities and relationships – there’s no single authoritative vocabulary. It’s a permanent work in progress, with no definitive or final structure. Entities need to include individuals as well as categories, with a network graph showing relatedness and the evidence for that relatedness as the basic structure.

He suggested a focus on organising knowledge, not collections, whether objects or texts. Collaborative activities should be based around this knowledge, using tools that work with linked data entities. This raised the issue of contested ground and the application of labels and meaning to data: your ‘discovery’ is my ‘invasion’. This makes citizen humanities problematic – who gets to describe, assign, link, and what does that mean for scholarly authority?

My notes aren’t clear but I think Burrows said these ideas were based on analysis of medieval manuscript research, which Jane Hunter had also worked on, and they were looking towards the architecture for HuNI. It was encouraging to see an approach to linked data so grounded in the complexity of historians research practices and data, and is yet another reason I’m looking forward to following HuNI’s progress – I think it will have valuable lessons for linked data projects in the rest of the world. [These slides from the Linked Open Data workshop in Melbourne a few weeks later show the academic workflow HuNI plans to support and some of the issues they’ll have to tackle.]

The second speaker was the University of Sydney’s Stephen Hayes on ‘how linked is linked enough?’. [See also Snurb’s liveblog.] He’s looking at projects through a linked data lens, trying to assess how much further projects need to go to comfortably claim to be linked data. He talked about the issues projects encountered trying to get to be 5 star Linked Data.

He looked at projects like the Dictionary of Sydney, which expresses data as RDF as well in a public-facing HTML interface and comes close to winning 5 stars. It is a demonstration of the fact that once data is expressed in one form, it can be easily expressed in another form – stable entities can be recombined to form new structures. The project is powered by Heurist, a tool for managing a wide range of research data. The History of Balinese Painting could not find other institutions that exposed Balinese collection data in programmable form so they could link to them (presumably a common problem for early adopters but at least it helps solve the ‘chicken or the egg’ problem that dogs linked data in cultural heritage and the humanities). The sites URLs don’t return useful metadata but they do try to refer to image URLs so it’s ‘sorta persistent’. He gave it a rating of 3.5 stars. Other projects mentioned (also built on Heurist?) were the Charles Harpur Critical Archive, rated at 3.5 stars and Virtual Zagora, rated at 3 stars.

The paper was an interesting discussion of the team work required to get the full 5 stars of linked data, and the trade-offs in developing functions for structured data (e.g. implementing schema.org’s painting markup versus focussing on the quality of the human-facing pages); reassuring curators about how much data would be released and what would be kept back; developing ontologies throughout a project or in advance and the overhead in mapping other projects concepts to their own version of Dublin Core.

The final paper in the session was ‘As Curious An Entity: Building Digital Resources from Context, Records and Data’ by Michael Jones and Antonina Lewis (abstract). [See also Snurb’s liveblog.] They said that improving the visibility of relationships between entities enriches archives, as does improving relationships between people. The title quote in full is ‘as curious an entity as bullshit writ on silk’ – if the parameters, variables and sources of data are removed from material, then it’s just bullshit written on silk. Visualisations remove sources, complexity and ‘relative context’, and would be richer if they could express changes in data over time and space. They asked how one would know that information presented in a visualisation is accurate if it doesn’t cite sources? You must seek and reference original material to support context layers.

They presented an overview of the Saulwick Archive project (Saulwick ran polls for the Fairfax newspapers for years) and the Australian Women’s Register, discussed common issues faced in digital humanities, and the role of linked data and human relationships in building digital resources. They discussed the value of maintaining relationships between archives and donors after the transfer of material, and the need to establish data management plans to make provision for raw data and authoritative versions of related contextual material, and to retain data to make sense of the archives in the future. The Australian Women’s Register includes content written for the site and links out to the archival repositories and libraries where the records are held. In a lovely phrase, they described records as the ‘evidential heart’ for the context and data layers. They also noted that the keynote overlooked non-academic re-use of digital resources, but it’s another argument for making data available where possible.

Digital histories session

The first paper was ‘Community Connections: The Renaissance of Local History’ by Lisa Murray. Murray discussed the ‘three Cs’ needed for local history: connectivity, community, collaboration.

Is the process of geo-referencing forcing historians to be more specific about when or where things happened? Are people going from the thematic to the particular? Is it exciting for local historians to see how things fit into state or national narratives? Digital history has enormous potential for local and family history and to represent complicated relationships within a community and how they’ve changed over time. Digital history doesn’t have to be article-centric – it enables new forms of presentation. Historians have to acknowledge that Wikipedia is aligned to historians’ processes. Local history is strongly represented on Wikipedia. The Dictionary of Sydney provides a universal framework for accessing Sydney’s history.

The democratisation of historical production is exciting but raises it challenges for public understandings of how history undertaken and represented. Are some histories privileged? Making History (a project by Museum Victoria and Monash University) encourages the use of online resources but does that privilege digitised sources, and will others be neglected? Are easily accessible sources privileged, and does that change what history is written? What about community collections or vast state archives that aren’t digitised?

History research methodologies are changing – Google etc is shaping how research is undertaken; the ubiquity of keyword searching reinforces the primacy of names. She noted the impact of family historians on how archives prioritise work. It’s not just about finding sources – to produce good history you need to analyse the sources. Professional historians are no longer the privileged producers of knowledge. History can be parochial, inclusive, but it can also lack sense of historical perspective, context. Digital history production amplifies tensions between popular history and academic history [and presumably between amateur and academic historians?].

Apparently primary school students study more local history than university students do. Local and community history is produced by broad spectrum of community but relatively few academic historians are participating. There’s a risk of favouring quirky facts over significance and context. Unless history is more widely taught, local history will be tarred with same brush as antiquarians. History is not only about narrative and context… Historians need to embrace the renaissance of local and community history.

In the questions there was some discussion of the implications of Sydney’s city archives being moved to a more inconvenient physical location. The justification is that it’s available through Ancestry but that removes it from all context [and I guess raises all the issues of serendipity etc in digital vs physical access to archives].

The next speaker was Tim Sherratt on ‘Inside the bureaucracy of White Australia’. His slides are online and his abstract is on the Invisible Australians site. The Invisible Australians project is trying to answer the question of what the White Australia policy looked like to a non-white Australian.  He talked about how digital technology can help explore the practice of exclusion as legislation and administrative processes were gradually elaborated. Chinese Australians who left Australia and wanted to return had to prove both their identity and their right to land to convince officials they could return: ‘every non-white resident was potentially a prohibited immigrant just waiting to be exposed’. He used topic modelling on file titles from archival series and was able to see which documents related to the White Australia policy. This is a change from working through hierarchical structures of archives to working directly through the content of archives. This provides a better picture of what hasn’t survived, what’s missing and would have many other exciting uses. [His post on Topic modelling in the archives explains it better than my summary would.]

The final paper was Paul Turnbull on ‘Pancake history’. He noted that in e-research there’s a difference between what you can use in teaching and what makes people nervous in the research domain. He finds it ironic that professional advancement for historians is tied to writing about doing history rather than doing history. He talked about the need to engage with disciplinary colleagues who don’t engage with digital humanities, and issues around historians taking digital history seriously.

Sherratt’s talk inspired discussion of funding small-scale as well as large-scale infrastructure, possibly through crowdfunding. Turnbull also suggested ‘seeding ideas and sharing small apps is the way to go’.

[Note from when I originally posted this: I don’t know when my flight is going to be called, so I’ll hit publish now and keep working until I board – there’s lots more to fit in for day 2! In the afternoon I went to the ‘Digital History’ session. I’ll tidy up when I’m in the UK as I think blogger is doing weird LTR things because it may be expecting Arabic.]

See also Slow and still dirty Digital Humanities Australasia notes: day 3.

Quick and dirty Digital Humanities Australasia notes: day 1

As always, I should have done this sooner and tidied them up more, but better rough notes than nothing, so here goes… The Australasian Association for Digital Humanities held their inaugural conference in Canberra in March, 2012.  You can get an overall sense of the conference from the #DHA2012 tweets (I’ve put a CSV archive of #DHA2012 tweets from searchhash.com here, but note it’s not on Australian time) and from the keynotes.

In his opening keynote on the movements between close and distant reading, Alan Liu observed that the crux of the ‘reading’ issue depends on the field, and further, that ‘history is on a different evolutionary branch of digital humanities to literary studies’.  This is something I’ve been wondering about since finding myself back in digital humanities, and was possibly reflected in the variety of papers in the overall programme.  I was generally following sessions on digital history, geospatial themes and crowdsourcing, but there was so much in the programme that you could have followed a literary studies line and had a totally different conference experience.

In the next session I went to a panel on ‘Connecting Australia’s Cultural Datasets: A Vision for Collaboration’ with various people from the new ‘Humanities Networked Infrastructure’ (HuNI) (more background) presenting.  It started with Deb Verhoeven on ‘jailbreaking cultural data’ and the tension identified by Brand: “information wants to be expensive because it’s so valuable.  The right information in the right place just changes your life.  On the other hand, information wants to be free, because the cost of getting it out is lower and lower all the time. So you have these two things fighting against each other”. ‘Information wants to be social’: she discussed the need to understand the value of research in terms of community engagement, not just as academically ranked output, and to return research to the communities they’re investigating in meaningful ways.
 
Other statements that resonated were the need for organisational, semantic and technical interoperability in datasets to create collaborative environments. Collaboration requires data integration and exchange as well as dealing with different ideas about what ‘data’ is in different disciplines in the humanities. Collaboration in the cultural datasets community can follow unmet needs: discover data that’s currently hidden, make connections between disparate data sources, publish and share connections.

Ross Harley talked about how interoperability facilitates serendipity and trying to find new ways for data to collide. In the questions, Ingrid Mason asked about parallels with the GLAM (galleries, libraries, archives and museums) community, but it was also pointed out that GLAMs are behind in publishing their data – not everything HuNI wants to use is available yet.  I pointed out (on the twitter back channel) that requests for GLAM information from intensive users (e.g. researchers) helps memory institutions make the case for publishing more data – it’s still all a bit chicken-or-the-egg.

After lunch I went to the crowdsourcing session (not least cos I was presenting early results from my PhD in it).  The first presentation was on ‘crowdsourcing semantic tags on 3D museum artefacts’ which could have amazing applications for teaching material culture and criticism as well as source communities because it lets people annotate specific locations on a 3D model. Interestingly, during the questions someone reported people visiting campus classics museum who said they were enjoying seeing the objects in person but also wanted access to electronic versions – it’s fascinating watching audience expectations change.

The next presentation was on ‘Optimising crowdsourcing websites to increase volunteer participation’ which was a case study of NYPL’s What’s on the menu by Donelle McKinley who was using MECLAB/Flint McGlaughlin’s Conversion Sequence heuristic (clarity of value proposition, motivation, incentive, friction, anxiety) to assess how the project’s design was optimised to motivate audience participation.  Donelle’s analysis is really useful for people thinking about designing for crowdsourcing, but I’m not sure my notes do it justice, and I’m afraid I didn’t get many notes for Pauline Cockrill’s ‘Using Web 2.0 to make new connections in community history’ as I was on just afterwards.  One point I tweeted was about a quick win for crowdsourcing in using real-world communities as pointers to successful online collaborations, but I’m not sure now who said it.

One comment I noted during the discussion was “a real pain about Old Weather was that you’d get into working on a ship and it would just sail off on you” – interfaces that work for the organisation doesn’t always work for the audience.  This session was generally useful for clarifying my thoughts on the tension between optimising for efficiency or engagement in cultural heritage crowdsourcing projects.

In the interests of getting this posted I’ll stop here and call this ‘day 1’. I’m not sure if any of the slides are available yet, but I’ll update and link to any presentations or other write-ups I find. There’s a live blog of many sessions at http://snurb.info/taxonomy/term/137.

[Update: I’ve posted about Day 2 at Quick and dirty Digital Humanities Australasia notes: day 2 and Slow and still dirty Digital Humanities Australasia notes: day 3.]

Sunrise to sunset on the day of digital humanities

[I’ve copied my post from the official Day of DH (‘A Day of the Life of the Digital Humanities’) 2012 site so it can be integrated with my other posts on digital humanities and general blogging.]

Gumtrees in carparks. Just one of the things I miss about Australia.
Gumtrees in carparks. Just one of the things I miss about Australia.

I feel like a bit of a cheat, as through an accident of timing my Day of Digital Humanities has been far, far more glamorous than my usual working day (which tends to involve sitting at a desk in Oxford or Milton Keynes analysing websites; reading books, blog posts and articles; or interviewing people for my PhD).

But today I happened to be in Australia so it was all a bit more exciting…  I left for Sydney’s Central station as the sun was rising, heading for the 8am bus to Canberra. On the bus I tidied my Digital Humanities Australasia 2012 (DHA2012) conference paper and slides for tomorrow’s presentation on historians and crowdsourcing, and wrote a blog post about the week just past (Geek for a week: residency at the Powerhouse Museum).

After checking into my room at the ANU (Australian National University), I scanned my email for anything vital, uploaded my draft blog post and hit publish, then tweeted the link as I headed over to the National Museum of Australia where I was taking part in a playtest for a new game called Sembl.

photo
Play-testing Sembl on iPads

After the playtest was over we walked back to the ANU campus for a DHA2012 drinks reception and a LODLAM (linked open data in libraries, archives and museums) mini-bar meetup. An early night for me so I’m sorted for the first day of DHA2012 tomorrow!

Defining Digital Humanities
I was asked to define ‘digital humanities’ when I signed up for this site, came as a bit of a surprise and I don’t think I did a terribly good job. So here’s another, very personal definition based on my work in digital history and digital heritage:

Digital humanities is thinking through making, as well as writing… for me, it’s currently about thinking critically about the impact of digitality on scholarly practice in addition to applying digital techniques to the concerns of the humanities.

Notes on current issues in Digital Humanities

In July 2011, the Open University held a colloquium called ‘Digital technologies: help or hindrance for the humanities?’, in part to celebrate the launch of the Thematic Research Network for Digital Humanities at the OU.  A full multi-author report about the colloquium (titled ‘Colloquium: Digital Technologies: Help or Hindrance for the Humanities?’) will be coming out in the ‘Digital Futures Special Issue Arts and Humanities in HE’ edition of Arts and Humanities in Higher Education soon, but a workshop was also held at the OU’s Milton Keynes campus on Thursday to discuss some of the key ideas that came from the colloquium and to consider the agenda for the thematic research network.  I was invited to present in the workshop, and I’ve shared my notes and some comments below (though of course the spoken version varied slightly).

To help focus the presentations, Professor John Wolffe (who was chairing) suggested we address the following points:

  1. What, for you, were the two most important insights arising from last July’s colloquium?
  2. What should be the two key priorities for the OU’s DH thematic research network over the next year, and why?
Notes on the colloquium and current issues in the Digital Humanities
 

Introduction – who I am as context for how I saw the colloquium
Before I started my PhD, I was a digital practitioner – a programmer, analyst, bearer of Zeitgeisty made-up modern job titles – situated in an online community of technologists loosely based in academia, broadcasting, libraries, archives, and particularly, in public history and museums. That’s really only interesting in the context of this workshop because my digital community is constituted by the very things that challenge traditional academia – ad hoc collaboration, open data, publicly sharing and debating thoughts in progress.

For people who happily swim in this sea, it’s hard to realise how new and scary it can be, but just yesterday I was reminded how challenging the idea of a public identity on social media is for some academics, let alone the thought of finding time to learn and understand yet another tool. As a humanist-turned-technologist-turned-humanist, I have sympathy for the perspective of both worlds.

The two most important insights arising from last July’s colloquium?
John Corrigan‘s introduction made it clear that the answer to the question ‘what is digital humanities’ is still very open, and has perhaps as many different answers as there are humanists. That’s both exciting and challenging – it leaves room for the adaptation (and adoption) of DH by different humanities disciplines, but it also makes it difficult to develop a shared language for collaboration, for critiquing and peer reviewing DH projects and outputs… [I’ve also been wondering whether ‘digital humanities’ would eventually devolve into the practices of disciplines – digital history, etc – and how much digital humanities really works across different humanities disciplines in a meaningful way, but that’s a question for another day.]

In my notes, it was the discussion around Chris Bissel‘s paper on ‘Reality and authenticity’, Google Earth and archaeology that also stood out – the questions about what’s lost and gained in the digital context are important, but, as a technologist, I ask us to be wary of false dichotomies. There’s a danger in conflating the materiality of a resource, the seductive aura of an original document, the difficulties in accessing it, in getting past the gatekeepers, with the quality of the time spent with it; with the intrinsic complexity of access, context, interpretation… The sometimes difficult physical journey to an archive, or the smell of old books is not the same as earned access to knowledge.

What should be the two key priorities for the OU’s DH thematic research network over the next year?
[I don’t think I did a very good job answering this, perhaps because I still feel too new to know what’s already going on and what could be added. Also, I’m apparently unable to limit myself to two.]
I tend to believe that the digital humanities will eventually become normalised as just part of how humanities work, but we need to be careful about how that actually happens.

The early adopters have blazed their trails and lit the way, but in their wake, they’ve left the non-early adopters – the ordinary humanist – blinking and wondering how to thrive in this new world. I have a sense that digital humanities is established enough, or at least the impact of digitisation projects has been broad enough, that the average humanist is expected to take on the methods of the digital humanist in their grant and research proposals and in their teaching – but has the ordinary humanist been equipped with the skills and training and the access to technologists and collaborators to thrive? Do we need to give everyone access to DH101?

We need to deal with the challenges of interdisciplinary collaboration, particularly publication models, peer review and the inescapable REF. We need to understand how to judge the processes as well as the products of research projects, and to find better ways to recognise new forms of publication, particularly as technology is also disrupting the publication models that early career researchers used to rely on to get started.

Much of the critique of digital working was about what it let people get away with, or how it risks misleading the innocent researcher. As with anything on a screen, there’s an illusion of accuracy, completeness, neatness. We need shared practices to critique visualisations and discuss what’s really available in database searches, the representativeness of digital repositories, the quality of transcriptions and metadata, the context in which data was created and knowledge produced… Translating the slipperiness of humanities data and research questions into a digital world is a juicy challenge but it’s necessary if the potential of DH is to be exploited, whether by humanities scholars or the wider public who have new access to humanities content. ‘natural order of things’.

Digitality is no excuse to let students (or other researchers) get away with sloppy practice. The ability to search across millions of records is important, but you should treat the documents you find as rigorously as you’d treat something uncovered deep in the archives. Slow, deep reading, considering the pages or documents adjacent to the one that interests you, the serendipitous find – these are all still important. But we also need to help scholars find ways to cope with the sheer volume of data now available and the probably unrealistic expectations of complete coverage of all potential sources this may create. So my other key priority is working out and teaching the scholarly practices we need to ensure we survive the transition from traditional to digital humanities.

In conclusion, the same issues – trust, authority, the context of knowledge production – are important for my digital and my humanities communities, but these concepts are expressed very differently in each. We need to work together to build bridges between the practices of traditional academia and those of the digital humanities.

Quick PhD update from InterFace 2011

It feels like ages since I’ve posted, so since I’ve had to put together a 2 minute lightning talk for the Interface 2011 conference at UCL (for people working in the intersection of humanities and technology), I thought I’d post it here as an update.  I’m a few months into the PhD but am still very much working out the details of the shape of my project and I expect that how my core questions around crowdsourcing, digitisation, geolocation, researchers and historical materials fit together will change as I get further into my research. [Basically I’m acknowledging that I may look back at this and cringe.]

Notes for 2 minute lightning talk, Interface 2011

‘Crowdsourcing the geolocation of historical materials through participant digitisation’ 

Hi, I’m Mia, I’m working on a PhD in Digital Humanities in the History department at the Open University.

I’m working on issues around crowdsourcing the digitisation and geolocation of historical materials. I’m looking at ‘participant digitisation’ so I’ll be conducting research and building tools to support various types of researchers in digitising, transcribing and geolocating primary and secondary sources.

I’ll also create a spatial interface that brings together the digitised content from all participant digitisers. The interface will support the management of sources based on what I’ve learned about how historians evaluate potential sources.

The overall process has three main stages: research and observation that leads to iterative cycles of designing, building and testing the interfaces, and finally evaluation and analysis on the tools and the impact of geolocated (ad hoc) collections on the practice of historical research.

My PhD proposal (Provisional title: Participatory digitisation of spatially indexed historical data)

[Update: I’m working on a shorter version with fewer long words. Something like crowdsourcing geolocated historial materials/artefacts with specialist users/academic contributors/citizen historians.]

A few people have asked me about my PhD* topic, and while I was going to wait until I’d started and had a chance to review it in light of the things I’m already starting to learn about what else is going on in the field, I figured I should take advantage of having some pre-written material to cover the gap in blogging while I try to finish various things (like, um, my MSc dissertation) that were hijacked by a broken wrist. So, to keep you entertained in the meantime, here it is.

Please bear in mind that it’s already out-of-date in terms of my thinking and sense of what’s already happening in the field – I’m really looking forward to diving into it but my plan to spend some time thinking about the project before I started has been derailed by what felt like a year of having an arm in a cast.

* I never got around to posting about this because my disastrous slip on the ice happened just two days after I resigned, but I’m leaving my job at the Science Museum to take up the offer of a full-time PhD in Digital Humanities at the Open University in mid-March.

Provisional title: Participatory digitisation of spatially indexed historical data

This project aims to investigate ‘participatory digitisation’ models for geo-located historical material.

This project begins with the assumption that researchers are already digitising and geo-locating materials and asks whether it is possible to create systems to capture and share this data. Could the digital records and knowledge generated when researchers access primary materials be captured at the point of creation and published for future re-use? Could the links between materials, and between materials and locations, created when researchers use aggregated or mass-digitised resources, be ‘mined’ for re-use?

Through the use of a case study based around discovering, collating, transforming and publishing geo-located resources related to early scientific women, the project aims to discover:

  • how geo-located materials are currently used and understood by researchers,
  • what types of tools can be designed to encourage researchers to share records digitised for their own personal use
  • whether tools can be designed to allow non-geospatial specialists to accurately record and discover geo-spatial references
  • the viability of using online geo-coding and text mining services on existing digitised resources

Possible outcomes include an evaluation of spatially-oriented approaches to digital heritage resource discovery and use; mental models of geographical concepts in relation to different types of historical material and research methods; contributions to research on crowdsourcing digital heritage resources (particularly the tensions between competition and co-operation, between the urge to hoard or share resources) and prototype interfaces or applications based on the case study.

The project also provides opportunities to reflect on what it means to generate as well as consume digital data in the course of research, and on the changes digital opportunities have created for the arts and humanities researcher.

** This case study is informed by my thinking around the possibilities of re-populating the landscape with references to the lives, events, objects, etc, held by museums and other cultural heritage institutions, e.g. outside museum walls and by an experimental, collaborative project around ‘modern bluestockings’, that aimed to locate and re-display the forgotten stories around unconventional and pioneering women in science, technology and academia.

On ‘cultural heritage technologists’

A Requirements Engingeering lecture at uni yesterday discussed ‘satisfaction arguments’ (a way of relating domain knowledge to the introduction of a new system in an environment), emphasising the importance of domain knowledge in understanding user and system requirements – an excellent argument for the importance of cultural heritage technologists in good project design.  The lecture was a good reminder that I’ve been meaning to post about ‘cultural heritage technologists’ for a while. In a report on April’s Museums and the Web 2009, I mentioned in passing:

…I also made up a new description for myself as I needed one in a hurry for moo cards: cultural heritage technologist. I felt like a bit of a dag but then the lovely Ryan from the George Eastman House said it was also a title he’d wanted to use and that made me feel better.

I’d expanded further on it for the first Museums Pecha Kucha night in London:

Museum technologists are not merely passive participants in the online publication process. We have skills, expertise and experience that profoundly shape the delivery of services. In Jacob Nielsen’s terms, we are double domain experts.  This brings responsibilities on two fronts – for us, and for the museums that employ us.

Nielsen describes ‘double usability specialists’ or ‘double experts’ as those with expertise in human-computer interaction and in the relevant domain or sector (e.g. ref).  He found that these double experts were more effective at identifying usability issues, and I’ve extrapolated from that to understand the role of dual expertise in specifying and developing online and desktop applications.
Commenters in the final session of MW2009 conference described the inability of museums to recognise and benefit from the expertise of their IT or web staff, instead waiting until external gurus pronounced on the way of the future – which turns out to be the same things museum staff had been saying for years.  (Sound familiar?)

So my post-MW2009 ‘call to arms’ said “museums should recognise us (museum technologists) as double domain experts. Don’t bury us like Easter eggs in software/gardens. There’s a lot of expertise in your museum, if you just look. We can save you from mistakes you don’t even know you’re making. Respect our expertise – anyone can have an opinion about the web but a little knowledge is easily pushed too far”.

However, I’m also very aware of our responsibilities. A rough summary might be:

Museum technologists have responsibilities too.  Don’t let recognition as a double domain expert make you arrogant or a ‘know it all’. Be humble. Listen. Try to create those moments of understanding, both yours from conversation with others, and others from conversation with you – and cherish that epiphany.  Break out of the bubble that tech jargon creates around our discussions.  Share your excitement. Explain how a new technology will benefit staff and audiences, show them why it’s exciting. Respect the intelligence of others we work with, and consider it part of our job to talk to them in language they understand. Bring other departments of the museum with us instead of trying to drag them along.

Don’t get carried away with idea that we are holders of truth; we need to take advantage of the knowledge and research of others. Yes, we have lots of expertise but we need to constantly refresh that by checking back with our audiences and internal stakeholders. We also need to listen to concerns and consider them seriously; to acknowledge and respect their challenges and fears.  Finally, don’t be afraid to call in peers to help with examples, moral support and documentation.

My thoughts on this are still a work in progress, and I’d love to hear what you think.  Is it useful, is it constructive?  Does a label like ‘cultural heritage technologist’ or ‘museum technologist’ help others respect your learning and expertise?  Does it matter?

[Update, April 2012: as the term has become more common, its definition has broadened.  I didn’t think to include it here, but to me, a technologist ia more than just a digital producer (as important as they are) – while they don’t have to be a coder, they do have a technical background. Being a coder also isn’t enough to make one a technologist as it’s also about a broad range of experience, ideally across analysis, implementation and support.  But enough about me – what’s your definition?]

The UK as a knowledge-based economy

Rather off-topic, but I wonder what role cultural heritage organisations might have in a knowledge economy. I would imagine that libraries and archives are already leading in that regard, but also that skills currently regarded as belonging to the ‘digital humanities’ will become more common.

In less than three years time, more than half of UK GDP will be generated by people who create something from nothing, according to the 2007 Developing the Future (DtF) report launched today at the British Library.

The report, commissioned by Microsoft and co-sponsored by Intellect, the BCS and The City University, London, sets out the key challenges facing the UK as it evolves into a fully-fledged knowledge-based economy. The report also sets out a clear agenda for action to ensure the UK maintains its global competitiveness in the face of serious challenges.

The report identifies a number of significant challenges that the technology industry needs to address if these opportunities are to be grasped. Primarily, these are emerging markets and skills shortages:

  • At current rates of growth China will overtake the UK in five years in the knowledge economy sector.
  • The IT industry faces a potential skills shortage: The UK’s IT industry is growing at five to eight times the national growth average, and around 150,000 entrants to the IT workforce are required each year. But between 2001 and 2006 there was a drop of 43 per cent in the number of students taking A-levels in computing.
  • The IT industry is only 20 per cent female and currently only 17 per cent of those undertaking IT-related degree courses are women. In Scotland, only 15 per cent of the IT workforce is female.

BCS: Developing the future.

The report also suggests that the ‘IT industry should look to dramatically increase female recruitment’ – I won’t comment for now but it will be interesting to see how that issue develops.

File under ‘fabulous resources that I doubt I’ll ever get time to read properly’: the Journal of Universal Computer Science, D-Lib Magazine (‘digital library research and development, including but not limited to new technologies, applications, and contextual social and economic issues’) and transcripts from the Research Library in the 21st Century symposium.

On the other hand, Introduction to Abject-Oriented Programming is a very quick read, and laugh-out-loud funny (if you’re a tragic geek like me).