academia – Open Objects

Early PhD findings: Exploring historians' resistance to crowdsourced resources

I wrote up some early findings from my PhD research for conferences back in 2012 when I was working on questions around 'but will historians really use resources created by unknown members of the public?'. People keep asking me for copies of my notes (and I've noticed people citing an online video version which isn't ideal) and since they might be useful and any comments would help me write-up the final thesis, I thought I'd be brave and post my notes.

A million caveats apply – these were early findings, my research questions and focus have changed and I've interviewed more historians and reviewed many more participative history projects since then; as a short paper I don't address methods etc; and obviously it's only a huge part of a tiny topic… (If you're interested in crowdsourcing, you might be interested in other writing related to scholarly crowdsourcing and collaboration from my PhD, or my edited volume on 'Crowdsourcing our cultural heritage'.) So, with those health warnings out of the way, here it is. I'd love to hear from you, whether with critiques, suggestions, or just stories about how it relates to your experience. And obviously, if you use this, please cite it!

Exploring historians' resistance to crowdsourced resources

Scholarly crowdsourcing may be seen as a solution to the backlog of historical material to be digitised, but will historians really use resources created by unknown members of the public?

The Transcribe Bentham project describes crowdsourcing as 'the harnessing of online activity to aid in large scale projects that require human cognition' (Terras, 2010a). 'Scholarly crowdsourcing' is a related concept that generally seems to involve the collaborative creation of resources through collection, digitisation or transcription. Crowdsourcing projects often divide up large tasks (like digitising an archive) into smaller, more manageable tasks (like transcribing a name, a line, or a page); this method has helped digitise vast numbers of primary sources.

My doctoral research was inspired by a vision of 'participant digitization', a form of scholarly crowdsourcing that seeks to capture the digital records and knowledge generated when researchers access primary materials in order to openly share and re-use them. Unlike many crowdsourcing projects which are designed for tasks performed specifically for the project, participant digitization harnesses the transcription, metadata creation, image capture and other activities already undertaken during research and aggregates them to create re-usable collections of resources.

Research questions and concepts

When Howe clarified his original definition, stating that the 'crucial prerequisite' in crowdsourcing is 'the use of the open call format and the large network of potential laborers', a 'perfect meritocracy' based not on external qualifications but on 'the quality of the work itself', he created a challenge for traditional academic models of authority and credibility (Howe 2006, 2008). Furthermore, how does anonymity or pseudonymity (defined here as often long-standing false names chosen by users of websites) complicate the process of assessing the provenance of information on sites open to contributions from non-academics? An academic might choose to disguise their identity to mask their research activities from competing peers, from a desire to conduct early exploratory work in private or simply because their preferred username was unavailable; but when contributors are not using their real names they cannot derive any authority from their personal or institutional identity. Finally, which technical, social and scholarly contexts would encourage researchers to share (for example) their snippets of transcription created from archival documents, and to use content transcribed by others? What barriers exist to participation in crowdsourcing or prevent the use of crowdsourced content?

Methods

I interviewed academic and family/local historians about how they evaluate, use, and contribute to crowdsourced and traditional resources to investigate how a resource based on 'meritocracy' disrupts current notions of scholarly authority, reliability, trust, and authorship. These interviews aimed to understand current research practices and probe more deeply into how participants assess different types of resources, their feelings about resources created by crowdsourcing, and to discover when and how they would share research data and findings.

I sought historians investigating the same country and time period in order to have a group of participants who faced common issues with the availability and types of primary sources from early modern England. I focused on academic and 'amateur' family or local historians because I was interested in exploring the differences between them to discover which behaviours and attitudes are common to most researchers and which are particular to academics and the pressures of academia.

I recruited participants through personal networks and social media, and conducted interviews in person or on Skype. At the time of writing, 17 participants have been interviewed for up to 2 hours each. It should be noted that these results are of a provisional nature and represent a snapshot of on-going research and analysis.

Early results

I soon discovered that citizen historians are perfect examples of Pro-Ams: 'knowledgeable, educated, committed, and networked' amateurs 'who work to professional standards' (Leadbeater and Miller, 2004; Terras, 2010b).

How do historians assess the quality of resources?

Participants often simply said they drew on their knowledge and experience when sniffing out unreliable documents or statements. When assessing secondary sources, their tacit knowledge of good research and publication practices was evident in common statements like '[I can tell from] it's the way it's written'. They also cited the presence and quality of footnotes, and the depth and accuracy of information as important factors. Transcribed sources introduced another layer of quality assessment – researchers might assess a resource by checking for transcription errors that are often copied from one database to another. Most researchers used multiple sources to verify and document facts found in online or offline sources.

When and how do historians share research data and findings?

It appears that between accessing original records and publishing information, there are several key stages where research data and findings might be shared. Stages include acquiring and transcribing records, producing visualisations like family trees and maps, publishing informal notes and publishing synthesised content or analysis; whether a researcher passes through all the stages depends on their motivation and audience. Information may change formats between stages, and since many claim not to share information that has not yet been sufficiently verified, some information would drop out before each stage. It also appears that in later stages of the research process the size of the potential audience increases and the level of trust required to share with them decreases.

For academics, there may be an additional, post-publication stage when resources are regarded as 'depleted' – once they have published what they need from them, they would be happy to share them. Family historians meanwhile see some value in sharing versions of family trees online, or in posting names of people they are researching to attract others looking for the same names.

Sharing is often negotiated through private channels and personal relationships. Methods of controlling sharing include showing people work in progress on a screen rather than sending it to them and using email in preference to sharing functionality supplied by websites – this targeted, localised sharing allows the researcher to retain a sense of control over early stage data, and so this is one key area where identity matters. Information is often shared progressively, and getting access to more information depends on your behaviour after the initial exchange – for example, crediting the provider in any further use of the data, or reciprocating with good data of your own.

When might historians resist sharing data?

Participants gave a range of reasons for their reluctance to share data. Being able to convey the context of creation and the qualities of the source materials is important for historians who may consider sharing their 'depleted' personal archives – not being able to provide this means they are unlikely to share. Being able to convey information about data reliability is also important. Some information about the reliability of a piece of information is implicitly encoded in its format (for example, in pencil in notebooks versus electronic records), hedging phrases in text, in the number of corroborating sources, or a value judgement about those sources. If it is difficult to convey levels of 'certainty' about reliability when sharing data, it is less likely that people will share it – participants felt a sense of responsibility about not publishing (even informally) information that hasn't been fully verified. This was particularly strong in academics. Some participants confessed to sneaking forbidden photos of archival documents they ran out of time to transcribe in the archive; unsurprisingly it is unlikely they would share those images.

Overall, if historians do not feel they would get information of equal value back in exchange, they seem less likely to share. Professional researchers do not want to give away intellectual property, and feel sharing data online is risky because the protocols of citation and fair use are presently uncertain. Finally, researchers did not always see a point in sharing their data. Family history content was seen as too specific and personal to have value for others; academics may realise the value of their data within their own tightly-defined circles but not realise that their records may have information for other biographical researchers (i.e. people searching by name) or other forms of history.

Which concerns are particular to academic historians?

Reputational risk is an issue for some academics who might otherwise share data. One researcher said: 'we are wary of others trawling through our research looking for errors or inconsistencies. […] Obviously we were trying to get things right, but if we have made mistakes we don't want to have them used against us. In some ways, the less you make available the better!'. Scholarly territoriality can be an issue – if there is another academic working on the same resources, their attitude may affect how much others share. It is also unclear how academic historians would be credited for their work if it was performed under a pseudonym that does not match the name they use in academia.

What may cause crowdsourced resources to be under-used?

In this research, 'amateur' and academic historians shared many of the same concerns for authority, reliability, and trust. The main reported cause of under-use (for all resources) is not providing access to original documents as well as transcriptions. Researchers will use almost any information as pointers or leads to further sources, but they will not publish findings based on that data unless the original documents are available or the source has been peer-reviewed. Checking the transcriptions against the original is seen as 'good practice', part of a sense of responsibility 'to the world's knowledge'.

Overall, the identity of the data creator is less important than expected – for digitised versions of primary sources, reliability is not vested in the identity of the digitiser but in the source itself. Content found on online sites is tested against a set of finely-tuned ideas about the normal range of documents rather than the authority of the digitiser.

Cite as:

Ridge, Mia. “Early PhD Findings: Exploring Historians’ Resistance to Crowdsourced Resources.” Open Objects, March 19, 2014. https://www.openobjects.org.uk/2014/03/early-phd-findings-exploring-historians-resistance-to-crowdsourced-resources/.

References

Howe, J. (undated). Crowdsourcing: A Definition http://crowdsourcing.typepad.com

Howe, J. (2006). Crowdsourcing: A Definition. http://crowdsourcing.typepad.com/cs/2006/06/crowdsourcing_a.html

Howe, J. (2008). Join the crowd: Why do multinationals use amateurs to solve scientific and technical problems? The Independent. http://www.independent.co.uk/life-style/gadgets-and-tech/features/join-the-crowd-why-do-multinationals-use-amateurs-to-solve-scientific-and-technical-problems-915658.html

Leadbeater, C., and Miller, P. (2004). The Pro-Am Revolution: How Enthusiasts Are Changing Our Economy and Society. Demos, London, 2004. http://www.demos.co.uk/files/proamrevolutionfinal.pdf

Terras, M. (2010a) Crowdsourcing cultural heritage: UCL's Transcribe Bentham project. Presented at: Seeing Is Believing: New Technologies For Cultural Heritage. International Society for Knowledge Organization, UCL (University College London). http://eprints.ucl.ac.uk/20157/

Terras, M. (2010b). “Digital Curiosities: Resource Creation via Amateur Digitization.” Literary and Linguistic Computing 25, no. 4 (October 14, 2010): 425–438. http://llc.oxfordjournals.org/cgi/doi/10.1093/llc/fqq019

Lighting beacons: research software engineers event and related topics

I've realised that it could be useful to share my reading at the intersection of research software engineers/cultural heritage technologist/digital humanities, so at the end I've posted some links to current discussions or useful reference points and work to provide pointers to interesting work.

But first; notes from last week's workshop for research software engineers, an event for people who 'not only develop the software, they also understand the research that it makes possible'. The organisers did a great job with the structure (and provided clear instructions on running a breakout session) – each unconference-style session had to appoint a scribe and report back to a plenary session as well as posting their notes to the group's discussion list so there's an instant archive of the event.

Discussions included:

How do you manage quality and standards in training – how do you make sure people are doing their work properly, and what are the core competencies and practices of an RSE?
How should the research community recognise the work of RSEs?
Sharing Research Software
Routes into research software development – why did you choose to be an RSE?
Do we need a RSE community?
and the closing report from the Steering Committee and group discussion on what an RSE community might be or do.

I ended up in the 'How should the research community recognise the work of RSES?' session. I like the definition we came up with: 'research software engineers span the role of researchers and software engineers. They have the domain knowledge of researchers and the development skills to be able to represent this knowledge in code'. On the other hand, if you only work as directed, you're not an RSE. This isn't about whether you make stuff, it's about how much you're shaping what you're making. The discussion also teased out different definitions of 'recognition' and how they related to people's goals and personal interests; the impact of 'short-termism' and project funding on stable careers, software quality, training and knowledge sharing. Should people cite the software they use in their research in the methods section of any publications? How do you work out and acknowledge someone's contribution to on-going or collaborative projects – and how do you account for double-domain expertise when recognising contributions made in code?

I'd written about the event before I went (in Beyond code monkeys: recognising technologists' intellectual contributions, which relates it to digital humanities and cultural heritage work) but until I was there I hadn't realised the extra challenges RSEs in science face – unlike museum technologists, science RSEs are deeply embedded in a huge variety of disciplines and can't easily swap between them.

The event was a great chance to meet people facing similar issues in their work and careers, and showed how incredibly useful the right label can be for building a community. If you work with science+software in the UK and want to help work out what a research software engineer community might be, join in the RSE discussion.

If you're reading this post, you might also be interested in:

Paul Walk's 2011 posts on The Strategic Developer: How Local Developers Can Make A Difference and The Value of Local Developers raised many of the issues discussed at the RSE event and the DevCSI ('Developer Community Supporting Innovation') initiative. There's also a Developer Contact Google Group.
On Ant-Lions and Scholar-Programmers by Doug Reside – the emphasis is on scholars rather than programmers and it's also interesting for the differences between academia in the US and UK (though it doesn't directly address them)
Alan Liu's "Why I'm In It" x 2 – Antiphonal Response to Stephan Ramsay on Digital Humanities and Cultural Criticism and Stephen Ramsey's 'Why I'm In It' – if you haven't been keeping up with the digital humanities, dive into the deep end here.
Michael Widner's Towards a Front Page for the Digital Humanities #dhthis, because it's one of many posts discussing DHThis and because I forget that not everyone lives and breaths this: 'software is a form of writing/making with its own assumptions and affordances, we have a responsibility to critique the software'

In ye olden days, beacon fires were lit on hills to send signals between distant locations. These days we have blogs.

Beyond code monkeys: recognising technologists' intellectual contributions

Two upcoming events suggest that academia is starting to recognise that specialist technologists – AKA 'research software engineers' or 'digital humanities software developers' – make intellectual contributions to research software, and further, that it is starting to realise the cost of not recognising them. In the UK, there's a 'workshop for research software engineers' on September 11; in the US there's Speaking in Code in November (which offers travel bursaries and is with ace people, so do consider applying).

But first, who are these specialist technologists, and why does it matter? The UK Software Sustainability Institute's 'workshop for research software engineers' says 'research software engineers … not only develop the software, they also understand the research that it makes possible'. In an earlier post, The Craftsperson and the Scholar, UCL's James Hetherington says a 'good scientific coder combines two characters: the scholar and the craftsperson'. Research software needs people who are both scholar – 'the archetypical researcher who is driven by a desire to understand things to their fullest capability' and craftsperson who 'desires to create and leave behind an artefact which reifies their efforts in a field': 'if you get your kicks from understanding the complex and then making a robust, clear and efficient tool, you should consider becoming a research software engineer'. A supporting piece in the Times Higher Education, 'Save your work – give software engineers a career track' points out that good developers can leave for more rewarding industries, and raises one of the key issues for engineers: not everyone wants to publish academic papers on their development work, but if they don't publish, academia doesn't know how to judge the quality of their work.

Over in the US, and with a focus on the humanities rather than science, the Scholar's Lab is running the 'Speaking in Code' symposium to highlight 'what is almost always tacitly expressed in our work: expert knowledge about the intellectual and interpretive dimensions of DH code-craft, and unspoken understandings about the relation of that work to ethics, scholarly method, and humanities theory'. In a related article, Devising New Roles for Scholars Who Can Code, Bethany Nowviskie of the Scholar's Lab discussed some of the difficulties in helping developers have their work recognised as scholarship rather than 'service work' or just 'building the plumbing':

"I have spent so much of my career working with software developers who are attached to humanities projects," she says. "Most have higher degrees in their disciplines." Unlike their professorial peers, though, they aren't trained to "unpack" their thinking in seminars and scholarly papers. "I've spent enough time working with them to understand that a lot of the intellectual codework goes unspoken," she says.

Women at work on C-47 Douglas cargo transport.
LOC image via Serendip-o-matic

Digital humanists spend a lot of time thinking about the role of 'making things' in the digital humanities but, to cross over to my other domain of interest, I think the international Museums and the Web conference's requirement for full written papers for all presentations has helped more museum technologists translate some of their tacit knowledge into written form. Everyone who wants to present their work has to find a way to write up their work, even if it's painful at the time – but once it's done, they're published as open access papers well before the conference. Museum technologists also tend to blog and discuss their work on mailing lists, which provides more opportunities to tease out tacit knowledge while creating a visible community of practice.

I wasn't at Museums and the Web 2013 but one of the sessions I was most interested in was Rich Cherry and Rob Stein's 'What’s a Museum Technologist today?' as they were going to report on the results of a survey they ran earlier this year to come up with 'a more current and useful description of our profession'. (If you're interested in the topic, my earlier posts on museum technologists include On 'cultural heritage technologists', Confluence on digital channels; technologists and organisational change?, Museum technologists redux: it's not about us, Survey results: issues facing museum technologists.) Rob's posted their slides at What is a Museum Technologist Anyway? and I'd definitely recommend you go check them out. Looking through the responses, the term 'museum technologist' seems to have broadened as more museum jobs involve creating content for or publishing on digital channels (whether web sites, mobile apps, ebooks or social media), but to me, a museum technologist isn't just someone who uses technology or social media – rather, there's a level of expertise or 'domain knowledge' across both museums and technology – and the articles above have reinforced my view that there's something unique in working so deeply across two or more disciplines. (Just to be clear: this isn't a diss for people who use social media rather than build things – there's also a world of expertise in creating content for the web and social media). Or to paraphrase James Hetherington, "if you get your kicks from understanding the complex and then making a robust, clear and efficient tool, you should consider becoming a museum technologist'.

To further complicate things, not everyone needs their work to reflect all their interests – some programmers and tech staff are happy to leave their other interests outside the office door, and leave engineering behind at the end of the day – and my recent experiences at One Week | One Tool reminded me that promiscuous interdisciplinarity can be tricky. Even when you revel in it, it's hard to remember that people wear multiple hats and can swap from production-mode to critically reflecting on the product through their other disciplinary lenses, so I have some sympathy for academics who wonder why their engineer expects their views on the relevant research topic to be heard. That said, hopefully events like these will help the research community work out appropriate ways of recognising and rewarding the contributions of researcher developers.

[Update, September 2013: I've posted brief notes and links to session reports from the research software engineers event at Lighting signals: research software engineers event and related topics.]

DHOxSS: 'From broadcast to collaboration: the challenges of public engagement in museums'

I'm just back from giving at a lightning talk for the Cultural Connections strand of the Digital.Humanities@Oxford Summer School 2013, and since the projector wasn't working to show my examples during my talk I thought I'd share my notes (below) and some quick highlights from the other presentations.

Mark Doffman said that it's important that academic work challenges and provokes, but make sure you get headlines for the right reasons, but not e.g. on how much the project costs. He concluded that impact is about provocation, not just getting people to say your work is wonderful.

Gurinder Punn of the university's Isis Innovation made the point that intellectual property and expertise can be transferred into businesses by consulting through your department or personally. (And it's not just for senior academics – one of the training sessions offered to PhD students at the Open University is 'commercialising your research').

Giles Bergel @ChapBookPro spoke on the Broadside Ballads Online (blog), explaining that folksong scholarship is often outside academia – there's a lot of vernacular scholarship and all sorts of domain specialists including musicians. They've considered crowdsourcing but want to be in a position to take the contributions as seriously as any print accession. They also have an image-match demonstrator from Oxford's Visual Geometry Group which can be used to find similar images on different ballad sheets.

Christian von Goldbeck-Stier offered some reflections on working with conductors as part of his research on Wagner. And perfectly for a summer's day:

Christian quotes Wilde on beauty: "one of the great facts of the world, like sunlight, or springtime…" http://t.co/8qGE9tLdBZ #dhoxss
— Pip Willcox (@pipwillcox) July 11, 2013

My talk notes: 'From broadcast to collaboration: the challenges of public engagement in museums'

I’m interested in academic engagement from two sides – for the past decade or so I was a museum technologist; now I’m a PhD student in the Department of History at the Open University, where I’m investigating the issues around academic and ‘amateur’ historians and scholarly crowdsourcing.

As I’ve moved into academia, I’ve discovered there’s often a disconnect between academia and museum practice (to take an example I know well), and that their different ways of working can make connecting difficult, even before they try to actually collaborate. But it’s worth it because the reward is more relevant, cutting-edge research that directly benefits practitioners in the relevant fields and has greater potential impact.

I tend to focus on engagement through participation and crowdsourcing, but engagement can be as simple as blogging about your work in accessible terms: sharing the questions that drive your research, how you’ve come to some answers, and what that means for the world at large; or writing answers to common questions from the public alongside journal articles.

Plan it

For a long time, museums worked with two publics: visitors and volunteers. They’d ask visitors what they thought in ‘have your say’ interactives, but to be honest, they often didn’t listen to the answers. They’d also work with volunteers but sometimes they valued their productivity more than they valued their own kinds of knowledge. But things are more positive these days – you've already heard a lot about crowdsourcing as a key example of more productive engagement.

Public engagement works better when it’s incorporated into a project from the start. Museums are exploring co-curation – working with the public to design exhibitions. Museums are recognising that they can’t know everything about a subject, and figuring out how to access knowledge ‘out there’ in the rest of the world. In the Oramics project at the Science Museum (e.g. Oramics to Electronica or Engaging enthusiasts online), electronic musicians were invited to co-curate an exhibition to help interpret an early electronic instrument for the public.

There’s a model from 'Public Participation in Scientific Research' (or 'citizen science') I find useful in my work when thinking about how much agency the public has in a project, and it's also useful for planning engagement projects. Where can you benefit from questions or contributions from the public, and how much control are you willing to give up?

Contributory projects designed by scientists, with participants involved primarily in collecting samples and recording data; Collaborative projects in which the public is also involved in analyzing data, refining project design, and disseminating findings; Co-created projects are designed by scientists and members of the public working together, and at least some of the public participants are involved in all aspects of the work. (Source: Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education (full report, PDF, 3 MB))

Do it

Museums have learnt that engaging the public means getting out of their venues (and their comfort zones). One example is Wikipedians-in-Residence, including working with Wikipedians to share images, hold events and contribute to articles. (e.g. The British Museum and Me, A Wikipedian-in-Residence at the British Museum, The Children's Museum's Wikipedian in Residence).

It’s not always straightforward – museums don’t do ‘neutral’ points of view, which is a key goal for Wikipedia. Museums are object-centric, Wikipedia is knowledge-centric. Museums are used to individual scholarship and institutional credentials, Wikipedia is consensus-driven and your only credentials are your editing history and your references. Museums are slowly learning to share authority, to trust the values of other platforms. You need to invest time to learn what drives the other groups, how to talk with them and you have to be open to being challenged.

Mean it

Done right, engagement should be transformative for all sides. According to the National Co-ordinating Centre for Public Engagement, engagement ‘is by definition a two-way process, involving interaction and listening, with the goal of generating mutual benefit.’ Saying something is ‘open to the public’ is easy; making efforts to make sure that it’s intellectually and practically accessible takes more effort; active outreach is a step beyond open. It's not the same as marketing – it may use the same social media channels, but it's a conversation, not a broadcast. It’s hard to fake being truly engaged (and it's rude) so you have to mean it – doing it cynically doesn't help anyone.

Asking people to do work that helps your mission is a double win. For example, Brooklyn Museum's 'Freeze Tag' ask members of their community to help moderate tags entered by people elsewhere – they're trusting members of the community to clean up content for them.

Enjoy it

My final example is the National Library of Ireland on Flickr Commons, who do a great job of engaging people in Irish history, partly through their enthusiasm for the subject and partly through the effort they put into collating comments and updating their records, showing how much they value contributions.

Almost by definition, any collaboration around engagement will be with people who are interested in your work, and they’ll bring new perspectives to it. You might end up working with international peers, academics from different disciplines, practitioner groups, scholarly amateurs or kids from the school down the road. And it’s not all online – running events is a great way to generate real impact and helps start conversations with potential for future collaboration.

You might benefit too! Talking about your research sometimes reminds you why you were originally interested in it… It’s a way of looking back and seeing how far you’ve come. It’s also just plain rewarding seeing people benefit from your research, so it's worth doing well.

—
Thanks again to Pip Willcox for the invitation to speak, and to the other speakers for their fascinating perspectives. Participation and engagement lessons from cultural heritage and academia is a bit of a hot topic at the moment – there's more on it (including notes from a related paper I gave with Helen Weinstein) at Participatory Practices.

Keeping corridors clear of dragons (on agency and digital humanities tools)

A while ago I posted 'Reflections on teaching Neatline', which was really about growing pains in the digital humanities. I closed by asking 'how do you balance the need for fast-moving innovative work-in-progress to be a bit hacky and untidy around the edges with the desires of a wider group of digital humanities-curious scholars [for stable, easy-to-use software]? Is it ok to say 'here be dragons, enter at your own risk'?' Looking back, I started thinking about this in terms of museum technologists (in Museum technologists redux: it's not about us) but there I was largely thinking of audiences, and slightly less of colleagues within museums or academia. I'm still not sure if this is a blog post or just an extended comment on those post, but either way, this is an instance of posting-as-thinking.

Bethany Nowviskie has problematised and contextualised some of these issues in the digital humanities far more elegantly for an invited talk at the MLA 2013 conference. You should go read the whole thing at resistance in the materials, but I want to quickly highlight some of her points here.

She quotes William Morris: '…you can’t have art without resistance in the material. No! The very slowness with which the pen or the brush moves over the paper, or the graver goes through the wood, has its value. And it seems to me, too, that with a machine, one’s mind would be apt to be taken off the work at whiles by the machine sticking or what not' and discusses her realisation that:

"Morris’s final, throwaway complaint is not about that positive, inherent resistance—the friction that makes art—which we happily seek within the humanities material we practice upon. It’s about resistance unhealthily and inaccessibly located in a toolset. … precisely this kind of disenfranchising resistance is the one most felt by scholars and students new to the digital humanities. Evidence of friction in the means, rather than the materials, of digital humanities inquiry is everywhere evident."

And she includes an important call to action for digital humanities technologists: "we diminish our responsibility to address this frustration by naming it the inevitable “learning curve” of the digital humanities. Instead, we might confess that among the chief barriers to entry are poorly engineered and ineptly designed research tools and social systems". Her paper is also a call for a more nuanced understanding and greater empathy from tool-builders toward those who are disenfranchised by tools they didn't create and can't hack to fit their needs. It's too easy to forget that an application or toolset that looks like something I can happily pick up and play with to make it my own may well look as unfathomable and un-interrogable as the case of a mobile phone to someone else.

Digital humanities is no longer a cosy clubhouse, which can be uncomfortable for people who'd finally found an academic space where they felt at home. But DH is also causing discomfort for other scholars as it encroaches on the wider humanities, whether it's as a funding buzzword, as a generator of tools and theory, or as a mode of dialogue. This discomfort can only be exacerbated by the speed of change, but I suspect that fear of the unknown demands of DH methods or anxiety about the mental capabilities required are even more powerful*. (And some of it is no doubt a reaction to the looming sense of yet another thing to somehow find time to figure out.) As Sharon Leon points out in 'Digital Methods for Mid-Career Avoiders?', digital historians are generally 'at home with the sense of uncomfortableness and risk of learning new methods and approaches' and can cope with 'a feeling of being at sea while figuring out something completely new', while conversely 'this kind of discomfort is simply to overwhelming for historians who are defined by being the expert in their field, being the most knowledgable, being the person who critiques the shortfalls of the work of others'.

In reflecting on March 2012's Digital Humanities Australasia and the events and conversations I've been part of over the last year, it seems that we need ways of characterising the difference between scholars using digital methods and materials to increase their productivity (swapping card catalogues for online libraries, or type-writers for Word) without fundamentally interrogating their new working practices, and those who charge ahead, inventing tools and methods to meet their needs. It should go without saying that any characterisations should not unfairly or pejoratively label either group (and those in-between).

Going beyond the tricky 'on-boarding' moments I talked about in 'Reflections on teaching Neatline', digital humanities must consider the effect of personal agency in relation to technology, issues in wider society that affect access to 'hack' skills and what should be done to make the tools, or the means, of DH scholarship more accessible and transparent. Growing pains are one thing, and we can probably all sympathise with an awkward teenage phase, but as digital humanities matures as a field, it's time to accept our responsibility for the environment we're creating for other scholars. Dragons are fine in the far reaches of the map where the adventurous are expecting them, but they shouldn't be encountered in the office corridor by someone who only wanted to get some work done.

* Since posting this, I've read Stephen Ramsey's 'The Hot Thing', which expresses more anxieties about DH than I've glanced at here: 'Digital humanities is the hottest thing in the humanities. … So it is meet and good that we talk about this hot thing. But the question is this: Are you hot?'. But even here, do technologists and the like have an advantage? I'm used to (if not reconciled to) the idea that every few years I'll have to learn another programming language and new design paradigms just to keep up; but even I'm glad I don't have to keep up with the number of frameworks that front-end web developers have to, so perhaps not?

Reflections on teaching Neatline

I've called this post 'Reflections on teaching Neatline' but I could also have called it 'when new digital humanists meet new software'. Or perhaps even 'growing pains in the digital humanities?'.

A few months ago, Anouk Lang at the University of Strathclyde asked me to lead a workshop on Neatline, software from the Scholar's Lab that plots 'archives, objects, and concepts in space and time'. It's a really exciting project, designed especially for humanists – the interfaces and processes are designed to express complexity and nuance through handcrafted exhibits that link historical materials, maps and timelines.

The workshop was on Thursday, and looking at the evaluation forms, most people found it useful but a few really struggled and teaching it was also slightly tough going. I've been thinking a lot about the possible reasons for that and I'm sharing them both as a request for others to share their experiences in similar circumstances and also in the hope that they'll help others.

The basic outline of the workshop was an intros round (who I am, who they are and what they want to learn); information on what Neatline is and what it can do; time to explore Neatline and explore what the software can and can't do (e.g. login, follow the steps at neatline.org/plugins/neatline to create an item based on a series of correspondence Anouk had been working on, deciding whether you want to transcribe or describe the letter, tweaking its appearance or linking it to other items); and a short period for reflection and discussion (e.g. 'What kinds of interpretive decisions did you find yourself making? What delighted you? What frustrated you?') to finish. If you're curious, you can follow along with my slides and notes or try out the Neatline sandbox site.

The first half was fine but some people really struggled with the hands-on section. Some of it was to do with the software itself – as a workshop, it was a brilliant usability test of the admin interfaces of the software for audiences outside the original set of users. Neatline was only launched in July this year and isn't even in version 2 yet so it's entirely understandable that it appears to have a few functional or UX bugs. The documentation isn't integrated into the interface yet (and sometimes lacks information that is probably part of the shared tacit knowledge of people working on the project) but they have a very comprehensive page about working with Neatline items. Overall, the process of handcrafting timelines and maps for a Neatline exhibit is still closer to 'first, catch your rabbit' than making a batch of ready-mix cupcakes. Neatline is also designed for a particular view of the world, and as it's built on top of other software (Omeka) with another very particular view of the world (and hello, Dublin Core), there's a strong underlying mental model that informs the processes for creating content that is foreign to many of its potential users, including some at the workshop.

But it was also partly because I set the bar too high for the exercises and didn't provide enough structure for some of the group. If I'd designed it so they created a simple Neatline item by closely following detailed instructions (as I have done for other, more consciously tech-for-beginners workshops), at least everyone would have achieved a nice quick win and have something they could admire on the screen. From there some could have tried customising the appearance of their items in small ways, and the more adventurous could have tried a few of the potential ways to present the sample correspondence they were working with to explore the effects of their digitisation decisions. An even more pragmatic but potentially divisive solution might have been to start with the background and demonstration as I did, but then do the hands-on activity with a smaller group of people who were up for exploring uncharted waters. On a purely practical level, I also should have uploaded the images of the letters used in the exercise to my own host so that they didn't have to faff with Dropbox and Omeka records to get an online version of the image to use in Neatline.

And finally it was also because the group had really mixed ICT skills. Most were fine (bar the occasional bug), but some were not. It's always hard teaching technical subjects when participants have varying levels of skill and aptitude, but when does it go beyond aptitude into your attitude about being pushed out of your comfort zone? I'd warned everyone at the start that it was new software, but if you haven't experienced beta software before I guess you don't have the context for understanding what that actually means.

I should make it clear here that I think the participants' achievements outshine any shortcomings – Neatline is a great tool for people working with messy humanities data who want to go beyond plonking markers on Google Maps, and I think everyone got that, and most people enjoyed the chance to play with Neatline.

But more generally, I also wonder if it has to do with changing demographics in the digital humanities – increasingly, not everyone interested in DH is an early, or even a late adopter, and someone interested in DH for the funding possibilities and cool factor might not naturally enjoy unstructured exploration of new software, or be intrigued by trying out different combinations of content and functionality just 'to see what happens'.

Practically, more information for people thinking of attending would be useful – 'if you know x already, you'll be fine; if you know y already, you'll be bored' would be useful in future. Describing an event as 'if you like trying new software, this is for you' would probably help, but it looks like the digital humanities might also now be attracting people who don't particularly like working things out as they go along – are they to be excluded? If using software like this is the onboarding experience for people new to the digital humanities, they're not getting the best first impression, but how do you balance the need for fast-moving innovative work-in-progress to be a bit hacky and untidy around the edges with the desires of a wider group of digital humanities-curious scholars? Is it ok to say 'here be dragons, enter at your own risk'?

Slow and still dirty Digital Humanities Australasia notes: day 3

These are my very rough notes from day 3 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Quick and dirty Digital Humanities Australasia notes: day 2) held in Canberra's Australian National University at the end of March.

We were welcomed to Day 3 by the ANU's Professor Marnie Hughes-Warrington (who expressed her gratitude for the methodological and social impact of digital humanities work) and Dr Katherine Bode. The keynote was Dr Julia Flanders on 'Rethinking Collections', AKA 'in praise of collections'… [See also Axel Brun's live blog.]

She started by asking what we mean by a 'collection'? What's the utility of the term? What's the cultural significance of collections? The term speaks of agency, motive, and implies the existence of a collector who creates order through selectivity. Sites like eBay, Flickr, Pinterest are responding to weirdly deep-seated desire to reassert the ways in which things belong together. The term 'collection' implies that a certain kind of completeness may be achieved. Each item is important in itself and also in relation to other items in the collection.

There's a suite of expected activities and interactions in the genre of digital collections, projects, etc. They're deliberate aggregations of materials that bear, demand individual scrutiny. Attention is given to the value of scale (and distant reading) which reinforces the aggregate approach…

She discussed the value of deliberate scope, deliberate shaping of collections, not craving 'everythingness'. There might also be algorithmically gathered collections…

She discussed collections she has to do with – TAPAS, DHQ, Women Writers Online – all using flavours of TEI, the same publishing logic, component stack, providing the same functionality in the service of the same kinds of activities, though they work with different materials for different purposes.

What constitutes a collection? How are curated collections different to user-generated content or just-in-time collections? Back 'then', collections were things you wanted in your house or wanted to see in the same visit. What does the 'now' of collections look like? Decentralisation in collections 'now'… technical requirements are part of the intellectual landscape, part of larger activities of editing and design. A crucial characteristic of collections is variety of philosophical urgency they respond to.

The electronic operates under the sign of limitless storage… potentially boundless inclusiveness. Design logic is a craving for elucidation, more context, the ability for the reader to follow any line of thought they might be having and follow it to the end. Unlimited informational desire, closing in of intellectual constraints. How do boundedness and internal cohesion help define the purpose of a collection? Deliberate attempt at genre not limited by technical limitations. Boundedness helps define and reflect philosophical purpose.

What do we model when we design and build digital collections? We're modelling the agency through which the collection comes into being and is sustained through usage. Design is a collection of representational practices, item selection, item boundaries and contents. There's a homogeneity in the structure, the markup applied to items. Item-to-item interconnections – there's the collection-level 'explicit phenomena' – the directly comparable metadata through which we establish cross-sectional views through the collection (eg by Dublin Core fields) which reveal things we already know about texts – authorship of an item, etc. There's also collection-level 'implicit phenomena' – informational commonalities, patterns that emerge or are revealed through inspection; change shape imperceptibly through how data is modelled or through software used [not sure I got that down right]; they're always motivated so always have a close connection with method.

Readerly knowledge – what can the collection assume about what the reader knows? A table of contents is only useful if you can recognise the thing you want to find in it – they're not always self-evident. How does the collection's modelling affect us as readers? Consider the effects of choices on the intellectual ecology of the collection, including its readers. Readerly knowledge has everything to do with what we think we're doing in digital humanities research.

The Hermeneutics of Screwing Around (pdf). Searching produces a dynamically located just-in-time collection… Search is an annoying guessing game with a passive-aggressive collection. But we prefer to ask a collection to show its hand in a useful way (i. e. browse)… Search -> browse -> explore.

What's the cultural significance of collections? She referenced Liu's Sidney's Technology… A network as flow of information via connection, perpetually ongoing contextualisation; a patchwork is understood as an assemblage, it implies a suturing together of things previously unrelated. A patchwork asserts connections by brute force. A network assumes that connections are there to be discovered, connected to. Patchwork, mosaic – connects pre-existing nodes that are acknowledged to be incommensurable.

We avow the desirability of the network, yet we're aware of the itch of edge cases, data that can't be brought under rule. What do we treat as noise and what as signal, what do we deny is the meaning of the collection? Is exceptionality or conformance to type the most significant case? On twitter, @aylewis summarised this as 'Patchworking metaphor lets us conceptualise non-conformance as signal not noise'

Pay attention to the friction in the system, rather than smoothing it over. Collections both express and support analysis. Expressing theories of genre etc in internal modelling… Patchwork – the collection articulates the scholarly interest that animated its creation but also interests of the reader… The collection is animated by agency, is modelled by it, even while it respects the agency we bring as readers. Scholarly enquiry is always a transaction involving agency on both ends.

My (not very good) notes from discussion afterwards… there was a question about digital femmage; discussion of the tension between the desire for transparency and the desire to permit many viewpoints on material while not disingenuously disavowing the roles in shaping the collection; the trend at one point for factoids rather than narratives (but people wanted the editors' view as a foundation for what they do with that material); the logic of the network – a collection as a set of parameters not as a set of items; Alan Liu's encouragement to continue with theme of human agency in understanding what collections are about (e.g. solo collectors like John Soane); crowdsourced work is important in itself regardless of whether it comes up with the 'best' outcome, by whatever metric. Flanders: 'the commitment to efficiency is worrisome to me, it puts product over people in our scale of moral assessment' [hoorah! IMO, engagement is as important as data in cultural heritage]; a question about the agency of objects, with the answer that digital surrogates are carriers of agency, the question is how to understand that in relation to object agency?

GIS and Mapping I

The first paper was 'Mapping the Past in the Present' by Andrew Wilson, which was a fast run-through some lovely examples based on Sydney's geo-spatial history. He discussed the spatial turn in history, and the mid-20thC shift to broader scales, territories of shared experience, the on-going concern with the description of space, its experience and management.

He referenced Deconstructing the map, Harley, 1989, 'cartography is seldom what the cartographers say it is'. All maps are lies. All maps have to be read, closely or distantly. He referenced Grace Karskens' On the rocks and discussed the reality of maps as evidence, an expression of European expansion; the creation of the maps is an exercise in power. Maps must be interpreted as evidence. He talked about deriving data from historic maps, using regressive analysis to go back in time through the sources. He also mentioned TGIS – time-enabled GIS. Space-time composite model – when have lots and lots of temporal changes, create polygon that describes every change in the sequence.

The second paper was 'Reading the Text, Walking the Terrain, Following the Map: Do We See the Same Landscape?' by Øyvind Eide. He said that viewing a document and seeing a landscape are often represented as similar activities… but seeing a landscape means moving around in it, being an active participant. Wood (2010) on the explosion of maps around 1500 – part of the development of the modern state. We look at older maps through modern eyes – maps weren't made for navigation but to establish the modern state.

He's done a case study on text v maps in Scandinavia, 1740s. What is lost in the process of converting text to maps? Context, vagueness, under-specification, negation, disjunction… It's a combination of too little and too much. Text has information that can't fit on a map and text that doesn't provide enough information to make a map. Under-specification is when a verbal text describes a spatial phenomenon in a way that can be understood in two different ways by a competent reader. How do you map a negative feature of a landscape? i.e. things that are stated not to be there. 'Or' cannot be expressed on a map… Different media, different experiences – each can mediate only certain aspects for total reality (Ellestrom 2010).

The third paper was 'Putting Harlem on the Map' by Stephen Robertson. This article on 'Writing History in the Digital Age' is probably a good reference point: Putting Harlem on the Map, the site is at Digital Harlem. The project sources were police files, newspapers, organisational archives… They were cultural historians, focussed on individual level data, events, what it was like to live in Harlem. It was one of first sites to employ geo-spatial web rather than GIS software. Information was extracted and summarised from primary sources, [but] it wasn't a digitisation project. They presented their own maps and analysis apart from the site to keep it clear for other people to do their work. After assigning a geo-location it is then possible to compare it with other phenomena from the same space. They used sources that historians typically treat as ephemera such as society or sports pages as well as the news in newspapers.

He showed a great list of event types they've gotten from the data… Legal categories disaggregate crime so it appears more often in the list though was the minority of data. Location types also offers a picture of the community.

Creating visualisations of life in the neighbourhood…. when mapping at this detailed scale they were confronted with how vague most historical sources are and how they're related to other places. 'Historians are satisfied in most cases to say that a place is 'somewhere in Harlem'.' He talked about visualisations as 'asking, but not explaining, why there?'.

I tweeted that I'd gotten a lot more from his demonstration of the site than I had from looking at it unaided in the past, which lead to a discussion with @claudinec and @wragge about whether the 'search vs browse' accessibility issue applies to geospatial interfaces as well as text or images (i.e. what do you need to provide on the first screen to help people get into your data project) and about the need for as many hooks into interfaces as possible, including narratives as interfaces.

Crowdsourcing was raised during the questions at the end of the session, but I've forgotten who I was quoting when I tweeted, 'by marginalising crowdsourcing you're marginalising voices', on the other hand, 'memories are complicated'. I added my own point of view, 'I think of crowdsourcing as open source history, sometimes that's living memory, sometimes it's research or digitisation'. If anything, the conference confirmed my view that crowdsourcing in cultural heritage generally involves participating in the same processes as GLAM staff and humanists, and that it shouldn't be exploitative or rely on user experience tricks to get participants (though having made crowdsourcing games for museums, I obviously don't have a problem with making the process easier to participate in).

The final paper I saw was Paul Vetch, 'Beyond the Lowest Common Denominator: Designing Effective Digital Resources'. He discussed the design tensions between: users, audiences (and 'production values'); ubiquity and trends; experimentation (and failure); sustainability (and 'the deliverable'),

In the past digital humanities has compartmentalised groups of users in a way that's convenient but not necessarily valid. But funding pressure to serve wider audiences means anticipating lots of different needs. He said people make value judgements about the quality of a resource according to how it looks.

Ubiquity and trends: understanding what users already use; designing for intuition. Established heuristics for web design turn out to be completely at odds with how users behave.

Funding bodies expect deliverables, this conditions the way they design. It's difficult to combine: experimentation and high production values [something I've posted on before, but as Vetch said, people make value judgements about the quality of a resource according to how it looks so some polish is needed]; experimentation and sustainability…

Who are you designing for? Not the academic you're collaborating with, and it's not to create something that you as a developer would use. They're moving away from user testing at the end of a project to doing it during the project. [Hoorah!]

Ubiquity and trends – challenges include a very highly mediated environment; highly volatile and experimental… Trying to use established user conventions becomes stifling. (He called useit.com 'old nonsense'!) The ludic and experiential are increasingly important elements in how we present our research back.

Mapping Medieval Chester took technology designed for delivering contextual ads and used it to deliver information in context without changing perspective (i.e. without reloading the page, from memory). The Gough map was an experiment in delivering a large image but also in making people smile. Experimentation and failure… Online Chopin Variorum Edition was an experiment. How is the 'work' concept challenged by the Chopin sources? Technical methodological/objectives: superimposition; juxtaposition; collation/interpolation…

He discussed coping strategies for the Digital Humanities: accept and embrace the ephemerality of web-based interfaces; focus on process and experience – the underlying content is persistent even if the interfaces don't last. I think this was a comment from the audience: 'if a digital resource doesn't last then it breaks the principle of citation – where does that leave scholarship?'

Summary

So those are my notes. For further reference I've put a CSV archive of #DHA2012 tweets from searchhash.com here, but note it's not on Australian time so it needs transposing to match the session times.

This was my first proper big Digital Humanities conference, and I had a great time. It probably helped that I'm an Australian expat so I knew a sprinkling of people and had a sense of where various institutions fitted in, but the crowd was also generally approachable and friendly.

I was also struck by the repetition of phrases like 'the digital deluge', the 'tsunami of data' – I had the feeling there's a barely managed anxiety about coping with all this data. And if that's how people at a digital humanities conference felt, how must less-digital humanists feel?

I was pleasantly surprised by how much digital history content there was, and even more pleasantly surprised by how many GLAMy people were there, and consequently how much the experience and role of museums, libraries and archives was reflected in the conversations. This might not have been as obvious if you weren't on twitter – there was a bigger disconnect between the back channel and conversations in the room than I'm used to at museum conferences.

As I mentioned in my day 1 and day 2 posts, I was struck by the statement that 'history is on a different evolutionary branch of digital humanities to literary studies', partly because even though I started my PhD just over a year ago, I've felt the title will be outdated within a few years of graduation. I can see myself being more comfortable describing my work as 'digital history' in future.

I have to finish by thanking all the speakers, the programme committee, and in particular, Dr Paul Arthur and Dr Katherine Bode, the organisers and the aaDH committee – the whole event went so smoothly you'd never know it was the first one!

And just because I loved this quote, one final tweet from @mikejonesmelb: Sir Ken Robinson: 'Technology is not technology if it was invented before you were born'.

Quick and dirty Digital Humanities Australasia notes: day 2

What better way to fill in stopover time in Abu Dhabi than continuing to post my notes from DHA2012? [Though I finished off the post and re-posted once I was back home.] These are my very rough notes from day 2 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Slow and still dirty Digital Humanities Australasia notes: day 3). In the interests of speed I'll share my notes and worry about my own interpretations later.

Keynote panel, 'Big Digital Humanities?'

Day 2 was introduced by Craig Bellamy, and began with a keynote panel with Peter Robinson, Harold Short and John Unsworth, chaired by Hugh Craig. [See also Snurb's liveblogs for Robinson, Short and Unsworth.] Robinson asked 'what constitutes success for the digital humanities?' and further, what does the visible successes of digital humanities mask? He said it's harder for scholars to do high quality research with digital methods now than it was 20 years ago. But the answer isn't more digital humanists, it's having the ingredients to allow anyone to build bridges… He called for a new generation of tools and methods to support the scholarship that people want to do: 'It should be as easy to make a digital edition (of a document/book) as it is to make a Facebook page', it shouldn't require collaboration with a digital humanist. To allow data made by one person to be made available to others, all digital scholarship should be made available under a Creative Commons licence (publishers can't publish it now if it's under a non-commercial licence), and digital humanities data should be structured and enriched with metadata and made available for re-use with other tools. The model for sustainability depends on anyone and everyone being able to access data.

Harold Short talked about big (or at least unescapable) data and the 'Svensson challenge' – rather than trying to work out how to take advantage of infrastructure created by and for the sciences, use your imagination to figure out what's needed for the arts and humanities. He called for a focus on infrastructure and content rather than 'data'.

John Unsworth reminded us that digital humanities is a certain kind of work in the humanities that uses computational methods as its research methods. It's not just using digital materials, though it does require large collections of data – it also requires a sense of how how the tools work.

What is the digital humanities?

Very different versions of 'digital humanities' emerged through the panel and subsequent discussion, leaving me wondering how they related to the different evolutionary paths of digital history and digital literature studies mentioned the day before. Meanwhile, on the back channel (from the tweets that are to hand), I wondered if a two-tier model of digital humanities was emerging – one that uses traditional methods with digital content (DH lite?); another that disrupts traditional methods and values. Though thinking about it now, the 'tsunami' of data mentioned is disruptive in its own right, regardless of the intentional choices one makes about research practices (which might have been what Alan Liu meant when he asked about 'seamless' and 'seamful' views of the world)…. On twitter, other people (@mikejonesmelb, @bestqualitycrab, @1n9r1d) wondered if the panel's interpretation of 'big' data was gendered, generational, sectoral, or any other combination of factors (including as the messiness and variability of historical data compared to literature) and whether it could have been about 'disciplinary breadth and inclusiveness' rather than scale.

Data morning session

The first speaker was Toby Burrows on 'Using Linked Data to Build Large‐Scale e‐Research Environments for the Humanities'. [Update: he's shared his slides and paper online and see also Snurb's liveblog.] Continuing some of the themes from the morning keynote panel, he said that the humanities has already been washed away in the digital deluge, the proliferation of digital stuff is beyond the capacity of individual researchers. It's difficult to answer complex humanities questions only using search with this 'industrialised' humanities data, but large-scale digital libraries and collections offer very little support for functions other than search. There's very little connection between data that researchers are amassing and what institutions are amassing.

He's also been looking at historians/humanists research practices [and selfishly I was glad to see many parallels with my own early findings]. The tools may be digital rather than paper and scissors, but historians are still annotating and excerpting as they always have. The 'sharing' part of their work has changed the most – it's easier to share, and they can share at an earlier stage if they choose to do that, but not a lot has changed at the personal level.

Burrows said applying applying linked data approach to manuscript research would go a long way to addressing the complexity of the field. For example, using global URIs for manuscripts and parts; separating names and concepts from descriptive information; and using linked data functions to relate scholarly activities (annotations, excerpts, representations etc) to manuscript descriptions, objects and publications. Linked data can provide a layer of entities that sits between research activities and descriptions/collections/publications, which avoids conflating the entities and the source material. Multiple naming schemes are necessary for describing entities and relationships – there's no single authoritative vocabulary. It's a permanent work in progress, with no definitive or final structure. Entities need to include individuals as well as categories, with a network graph showing relatedness and the evidence for that relatedness as the basic structure.

He suggested a focus on organising knowledge, not collections, whether objects or texts. Collaborative activities should be based around this knowledge, using tools that work with linked data entities. This raised the issue of contested ground and the application of labels and meaning to data: your 'discovery' is my 'invasion'. This makes citizen humanities problematic – who gets to describe, assign, link, and what does that mean for scholarly authority?

My notes aren't clear but I think Burrows said these ideas were based on analysis of medieval manuscript research, which Jane Hunter had also worked on, and they were looking towards the architecture for HuNI. It was encouraging to see an approach to linked data so grounded in the complexity of historians research practices and data, and is yet another reason I'm looking forward to following HuNI's progress – I think it will have valuable lessons for linked data projects in the rest of the world. [These slides from the Linked Open Data workshop in Melbourne a few weeks later show the academic workflow HuNI plans to support and some of the issues they'll have to tackle.]

The second speaker was the University of Sydney's Stephen Hayes on 'how linked is linked enough?'. [See also Snurb's liveblog.] He's looking at projects through a linked data lens, trying to assess how much further projects need to go to comfortably claim to be linked data. He talked about the issues projects encountered trying to get to be 5 star Linked Data.

He looked at projects like the Dictionary of Sydney, which expresses data as RDF as well in a public-facing HTML interface and comes close to winning 5 stars. It is a demonstration of the fact that once data is expressed in one form, it can be easily expressed in another form – stable entities can be recombined to form new structures. The project is powered by Heurist, a tool for managing a wide range of research data. The History of Balinese Painting could not find other institutions that exposed Balinese collection data in programmable form so they could link to them (presumably a common problem for early adopters but at least it helps solve the 'chicken or the egg' problem that dogs linked data in cultural heritage and the humanities). The sites URLs don't return useful metadata but they do try to refer to image URLs so it's 'sorta persistent'. He gave it a rating of 3.5 stars. Other projects mentioned (also built on Heurist?) were the Charles Harpur Critical Archive, rated at 3.5 stars and Virtual Zagora, rated at 3 stars.

The paper was an interesting discussion of the team work required to get the full 5 stars of linked data, and the trade-offs in developing functions for structured data (e.g. implementing schema.org's painting markup versus focussing on the quality of the human-facing pages); reassuring curators about how much data would be released and what would be kept back; developing ontologies throughout a project or in advance and the overhead in mapping other projects concepts to their own version of Dublin Core.

The final paper in the session was 'As Curious An Entity: Building Digital Resources from Context, Records and Data' by Michael Jones and Antonina Lewis (abstract). [See also Snurb's liveblog.] They said that improving the visibility of relationships between entities enriches archives, as does improving relationships between people. The title quote in full is 'as curious an entity as bullshit writ on silk' – if the parameters, variables and sources of data are removed from material, then it's just bullshit written on silk. Visualisations remove sources, complexity and 'relative context', and would be richer if they could express changes in data over time and space. They asked how one would know that information presented in a visualisation is accurate if it doesn't cite sources? You must seek and reference original material to support context layers.

They presented an overview of the Saulwick Archive project (Saulwick ran polls for the Fairfax newspapers for years) and the Australian Women's Register, discussed common issues faced in digital humanities, and the role of linked data and human relationships in building digital resources. They discussed the value of maintaining relationships between archives and donors after the transfer of material, and the need to establish data management plans to make provision for raw data and authoritative versions of related contextual material, and to retain data to make sense of the archives in the future. The Australian Women's Register includes content written for the site and links out to the archival repositories and libraries where the records are held. In a lovely phrase, they described records as the 'evidential heart' for the context and data layers. They also noted that the keynote overlooked non-academic re-use of digital resources, but it's another argument for making data available where possible.

Digital histories session

The first paper was 'Community Connections: The Renaissance of Local History' by Lisa Murray. Murray discussed the 'three Cs' needed for local history: connectivity, community, collaboration.

Is the process of geo-referencing forcing historians to be more specific about when or where things happened? Are people going from the thematic to the particular? Is it exciting for local historians to see how things fit into state or national narratives? Digital history has enormous potential for local and family history and to represent complicated relationships within a community and how they've changed over time. Digital history doesn't have to be article-centric – it enables new forms of presentation. Historians have to acknowledge that Wikipedia is aligned to historians' processes. Local history is strongly represented on Wikipedia. The Dictionary of Sydney provides a universal framework for accessing Sydney's history.

The democratisation of historical production is exciting but raises it challenges for public understandings of how history undertaken and represented. Are some histories privileged? Making History (a project by Museum Victoria and Monash University) encourages the use of online resources but does that privilege digitised sources, and will others be neglected? Are easily accessible sources privileged, and does that change what history is written? What about community collections or vast state archives that aren't digitised?

History research methodologies are changing – Google etc is shaping how research is undertaken; the ubiquity of keyword searching reinforces the primacy of names. She noted the impact of family historians on how archives prioritise work. It's not just about finding sources – to produce good history you need to analyse the sources. Professional historians are no longer the privileged producers of knowledge. History can be parochial, inclusive, but it can also lack sense of historical perspective, context. Digital history production amplifies tensions between popular history and academic history [and presumably between amateur and academic historians?].

Apparently primary school students study more local history than university students do. Local and community history is produced by broad spectrum of community but relatively few academic historians are participating. There's a risk of favouring quirky facts over significance and context. Unless history is more widely taught, local history will be tarred with same brush as antiquarians. History is not only about narrative and context… Historians need to embrace the renaissance of local and community history.

In the questions there was some discussion of the implications of Sydney's city archives being moved to a more inconvenient physical location. The justification is that it's available through Ancestry but that removes it from all context [and I guess raises all the issues of serendipity etc in digital vs physical access to archives].

The next speaker was Tim Sherratt on 'Inside the bureaucracy of White Australia'. His slides are online and his abstract is on the Invisible Australians site. The Invisible Australians project is trying to answer the question of what the White Australia policy looked like to a non-white Australian. He talked about how digital technology can help explore the practice of exclusion as legislation and administrative processes were gradually elaborated. Chinese Australians who left Australia and wanted to return had to prove both their identity and their right to land to convince officials they could return: 'every non-white resident was potentially a prohibited immigrant just waiting to be exposed'. He used topic modelling on file titles from archival series and was able to see which documents related to the White Australia policy. This is a change from working through hierarchical structures of archives to working directly through the content of archives. This provides a better picture of what hasn't survived, what's missing and would have many other exciting uses. [His post on Topic modelling in the archives explains it better than my summary would.]

The final paper was Paul Turnbull on 'Pancake history'. He noted that in e-research there's a difference between what you can use in teaching and what makes people nervous in the research domain. He finds it ironic that professional advancement for historians is tied to writing about doing history rather than doing history. He talked about the need to engage with disciplinary colleagues who don't engage with digital humanities, and issues around historians taking digital history seriously.

Sherratt's talk inspired discussion of funding small-scale as well as large-scale infrastructure, possibly through crowdfunding. Turnbull also suggested 'seeding ideas and sharing small apps is the way to go'.

[Note from when I originally posted this: I don't know when my flight is going to be called, so I'll hit publish now and keep working until I board – there's lots more to fit in for day 2! In the afternoon I went to the 'Digital History' session. I'll tidy up when I'm in the UK as I think blogger is doing weird LTR things because it may be expecting Arabic.]

Quick and dirty Digital Humanities Australasia notes: day 1

As always, I should have done this sooner and tidied them up more, but better rough notes than nothing, so here goes… The Australasian Association for Digital Humanities held their inaugural conference in Canberra in March, 2012. You can get an overall sense of the conference from the #DHA2012 tweets (I've put a CSV archive of #DHA2012 tweets from searchhash.com here, but note it's not on Australian time) and from the keynotes.

In his opening keynote on the movements between close and distant reading, Alan Liu observed that the crux of the 'reading' issue depends on the field, and further, that 'history is on a different evolutionary branch of digital humanities to literary studies'. This is something I've been wondering about since finding myself back in digital humanities, and was possibly reflected in the variety of papers in the overall programme. I was generally following sessions on digital history, geospatial themes and crowdsourcing, but there was so much in the programme that you could have followed a literary studies line and had a totally different conference experience.

In the next session I went to a panel on 'Connecting Australia's Cultural Datasets: A Vision for Collaboration' with various people from the new 'Humanities Networked Infrastructure' (HuNI) (more background) presenting. It started with Deb Verhoeven on 'jailbreaking cultural data' and the tension identified by Brand: "information wants to be expensive because it's so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is lower and lower all the time. So you have these two things fighting against each other". 'Information wants to be social': she discussed the need to understand the value of research in terms of community engagement, not just as academically ranked output, and to return research to the communities they're investigating in meaningful ways.

Other statements that resonated were the need for organisational, semantic and technical interoperability in datasets to create collaborative environments. Collaboration requires data integration and exchange as well as dealing with different ideas about what 'data' is in different disciplines in the humanities. Collaboration in the cultural datasets community can follow unmet needs: discover data that's currently hidden, make connections between disparate data sources, publish and share connections.

Ross Harley talked about how interoperability facilitates serendipity and trying to find new ways for data to collide. In the questions, Ingrid Mason asked about parallels with the GLAM (galleries, libraries, archives and museums) community, but it was also pointed out that GLAMs are behind in publishing their data – not everything HuNI wants to use is available yet. I pointed out (on the twitter back channel) that requests for GLAM information from intensive users (e.g. researchers) helps memory institutions make the case for publishing more data – it's still all a bit chicken-or-the-egg.

After lunch I went to the crowdsourcing session (not least cos I was presenting early results from my PhD in it). The first presentation was on 'crowdsourcing semantic tags on 3D museum artefacts' which could have amazing applications for teaching material culture and criticism as well as source communities because it lets people annotate specific locations on a 3D model. Interestingly, during the questions someone reported people visiting campus classics museum who said they were enjoying seeing the objects in person but also wanted access to electronic versions – it's fascinating watching audience expectations change.

The next presentation was on 'Optimising crowdsourcing websites to increase volunteer participation' which was a case study of NYPL's What's on the menu by Donelle McKinley who was using MECLAB/Flint McGlaughlin's Conversion Sequence heuristic (clarity of value proposition, motivation, incentive, friction, anxiety) to assess how the project's design was optimised to motivate audience participation. Donelle's analysis is really useful for people thinking about designing for crowdsourcing, but I'm not sure my notes do it justice, and I'm afraid I didn't get many notes for Pauline Cockrill's 'Using Web 2.0 to make new connections in community history' as I was on just afterwards. One point I tweeted was about a quick win for crowdsourcing in using real-world communities as pointers to successful online collaborations, but I'm not sure now who said it.

One comment I noted during the discussion was "a real pain about Old Weather was that you'd get into working on a ship and it would just sail off on you" – interfaces that work for the organisation doesn't always work for the audience. This session was generally useful for clarifying my thoughts on the tension between optimising for efficiency or engagement in cultural heritage crowdsourcing projects.

In the interests of getting this posted I'll stop here and call this 'day 1'. I'm not sure if any of the slides are available yet, but I'll update and link to any presentations or other write-ups I find. There's a live blog of many sessions at http://snurb.info/taxonomy/term/137.

[Update: I've posted about Day 2 at Quick and dirty Digital Humanities Australasia notes: day 2 and Slow and still dirty Digital Humanities Australasia notes: day 3.]

How to get published – Interface 2011 conference notes

These are my notes from the 'how to get published' session at InterFace 2011 – I've summarised some of the advice here in case it may help others, with the usual caveat that any mistakes are mine, etc.

Charlotte Frost spoke (slides) about 'PhD2Published', a site with advice, support and discussion about getting academic work published. As the site says, "Don’t underestimate how much of getting published comes down to knowing: A) How publishing works and what’s expected of you as a writer. B) Being professional, adaptable and easy to work with". She made the excellent point that if the jobs aren't out there, you could pour your energies into getting your book pitched and written. You also need to work out whether a book, journal articles or a mixture would work best for you (especially, I'd imagine, as publishers are taking on fewer books in this financial environment). Thinking of academic publishing as part of the incremental progression of your career is useful – you don't need to cram everything into one book.

Specific tips included:

make the book what you wish your thesis had been
thinking about the book you wish you'd had available as an undergraduate also helps make your book marketable
collect a list of courses that would put your book on their reading list (and why)
consider the way that your book contributes to the identity of the publishing house and could make it a covetable feature
bear the current financial situation in mind and include as much solid sales evidence as you can
look at how publishing is changing and think about appropriate formats for your work
think about where audiences for your work might be
find out how publishers would like you to pitch and stick to their guidelines
the tone of your pitch should be about why your book is a must-read (not a must-write)
look for series or lists with publishers and tell them how your book would fit in that strand
nail the very short text-only description right from the start
find out if there are grants or awards that could support the publication of your book and let the publisher know
line up a well-known and relevant academic to write a foreword for your book
build and promote an expertise that's tangential and helps bring other people to your work.

The next speaker was Ashgate's Dymphna Evans with lots of useful and realistic advice on 'Publishing your Monograph' (slides). She started with the importance of choosing the right publisher – find someone who peer reviews, talk to colleagues about their experiences, and find publishers with lists or series in your field. Interestingly, she said it's ok to choose more than one publisher (it will speed up the process, and you'll get more feedback on your proposals), unless of course a publisher contacted you first.

Following the guidelines on a publisher's website is vital – and check your proposal once you've completed it. You can send sample chapters but she doesn't recommend you write the whole thing upfront in this current financial environment. Don't send stuff you feel will need more work – publishers don't have time to deal with it. Be aware of commercial considerations (most publishers require a minimum sale (maybe 300 books) but it doesn't have to be a best seller). Be prepared to re-write your thesis. It helps to have published journal articles based on parts of your thesis if they can be re-written for the book. Ashgate have a guide on 'transforming your thesis into a book' (PDF) on their website, and they also have general Proposal Guidelines for Humanities and Social Science authors.

Tips for your book proposal – choose a good title and prepare a thorough synopsis of each chapter. Be realistic about the deliery date. Think about illustrations (e.g. copyright). Don't undersell yourself as an author. Consider the audience for your book (e.g. in Digital Humanities, don't underestimate the professional audience for your book… draw out the practical applications of your research for professionals.). Ensure the proposal covers everything.

When making decisions, publishers consider factors including whether your book may fit in a series and whether it will meet sales expectations, and your proposal is peer-reviewed. Peer reviews are subjective, so don't be discouraged if they're negative.

If you get a publishing contract – read through it, check clauses with publisher if you're not happy or don't understand them. Check delivery date and conditions of delivery. Check which rights you are transferring (don't need copyright, just publication rights). Is an e-book planned?

Read the publishers guidelines before preparing your final manuscript; clear all your copyright permissions and think about illustrations. [Which is useful advice even if you're just writing a book chapter]. The editorial process includes a peer review of the final text (allow 8-10 weeks); marketing; editorial work; then finally the book is published (5-6 months after submitting)!

The final presentation in this session was Julianne Nyhan on 'Book reviewing and the post-graduate' (slides). Despite the title, she included websites, exhibitions, emerging technologies as well as books in her tips. Interdisciplinary Science Reviews publishes traditional reviews of about 2,500 words, and 'review articles' of about 7,000 words. Review articles are a a synthesis of existing works with the aim of reaching new conclusion or interpretations.

At the simplest level, reviewing books is a way to expand your library. Reviews aren't peer reviewed in the strictest sense (though there is a quality bar), but review articles consistently appear among most cited papers in a given field, and it's a way for post-graduate students to use stuff they can't include in their thesis while getting their name and expertise known out there. It also gives you experience working with editors and publishers.

How to go about publishing book reviews:

Identify appropriate journals, establish their scope and mission, and review their reviews.
Write a short email to Book Reviews Editor including: research area; details of previous reviews or publications; books requested/suggested (or types if nothing currently listed). Make a reasonable impression in your cover note.
Agree on a realistic date for submission and keep to it. Iterate with editor about corrections and finally proof copies of work.

There's lots of information online on the hallmarks of a good review – it's not simply a summary but a contextualisation of research – how does it relate to others in the field? Does it advance knowledge in some way? Discussion of the work in the wider intellectual context is an opportunity for you to make interesting connections and bring your personal viewpoint to the review. Be fair and balanced with well-justified and accurate criticisms/points of approval. Never use a big word where a small word will do; never use two words when one will do. Be careful of jargon – ask a colleague in another field to read.

You should look at journal ranking when identifying journals, but maybe rank is less important than whether the journal is open access (and is therefore likely to have higher impact).