New challenges in digital history: sharing women's history on Wikipedia – my talk notes

I'm at The Albert M. Greenfield Digital Center for the History of Women's Education at Bryn Mawr College for the inaugural Women's History in the Digital World Conference. Since I'm about to speak and ask historians to share their research and write history in public, I thought I should also be brave and share my draft talk notes (which I've now updated with formatted references, though Blogger is still re-formatting things slightly oddly).

Introduction: New challenges in digital history: sharing women's history on Wikipedia

[slide – title, my details]
Hi, I'm Mia. I'm actually doing a PhD on scholarly crowdsourcing, or collaboratively creating online resources, and, thinking about the impact of digitality on the practices of historians, so this paper is indirectly related to my research but isn't core to it.
I proposed this paper as a deliberate provocation: 'if we believe the subjects of our research are important, then we should ensure they are represented on freely available encyclopedic sites like Wikipedia'. Just in case you're not familiar with it, Wikipedia is a free online encyclopedia 'that anyone can edit.' It contains 25 million articles, over 4 million of them in English, but also in 285 other languages, and has 100,000 active contributors[1].

'Brilliant Women' at the National Portrait Gallery

The genesis of this paper was two-fold. The 2008 exhibition 'Brilliant Women: 18th Century Bluestockings' at the UK National Portrait Gallery, made the point that 'Despite the fact that 'bluestockings' made a substantial contribution to the creation and definition of national culture their intellectual participation and artistic interventions have largely been forgotten'. As a computer programmer, reinventing the wheel and other inefficient processes drive me crazy, and I began to think about how digital publishing could intervene in the cycle of remembering and forgetting that seemed to be the fate of brilliant women throughout history. How could historians use digital platforms to stop those histories being lost and to make them easy for others to find?

[Screenshot – Caitlin Moran quote from How to be a woman: 'Even the ardent feminist historian, male or female – citing Amazons and tribal matriarchies and Cleopatra – can't conceal that women have done basically f*ck-all for the last 100,000 years']
A few years later, by then a brand-new PhD student, I attended the Women's History Network conference in London in 2011 and learnt of so many interesting lives that challenged conventional mainstream historical narratives of gender. I wished that others could hear those stories too. But when I asked if any of these histories were available outside academia on sites like Wikipedia, there was a strong sense that editing Wikipedia was something that other people did. But who better to make a case for better representation of women's histories than the people in that room? Who else has the skills, knowledge and the passion? Some academic battles may have been won regarding the importance of women's histories, but representing women's histories on the sites where ordinary people start their queries is hugely important. The quote on this slide illustrates why – even if it was meant in jest, it represents a certain world view.

WikiWomen's Collaborative

[slide – logos from http://en.wikipedia.org/wiki/Wikipedia:WikiWomen%27s_History_Month http://meta.wikimedia.org/wiki/WikiWomen%27s_Collaborative ]
Of course, I'm not the first, and definitely not the most qualified to make this point. I would also like to acknowledge the work of many groups and individuals, particularly within Wikipedia, that's preceded this.[2]

[slide – Scripps editathon, #tooFEW]
Things move fast in the digital world and we're at a different moment than the one when I proposed this paper. Gender issues on Wikipedia had been discussed for a number of years but there's been a recent burst of activity, including the #tooFEW ('Feminists Engage Wikipedia') editathons – 'a scheduled time where people edit Wikipedia together, whether offline, online, or a mix of both' – [3], held online and in person across four physical sites.[4] [5] I was going to be provocative and ask you to create Wikipedia entries about the histories you've invested so much in researching, but some of that is happening already. As a result, this is version 2 of this paper, but my starting question remains the same – assuming we believe that women's history is important, what's wrong with our current methods of research dissemination and dialogue?

The case of the Invisible Scholarship

[slide – outline of section]
Cumulative centuries of archival and theoretical work have been spent recovering women's histories, yet much of this inspiring scholarship might as well not exist when so few people have access to it. Sadly, it's currently the case that scholarship that isn't deliberately made public is invisible outside academia. The open access movement, with all its thorny complications, is one potential solution. Engaging in new forms of open scholarship and disseminating research on sites where the public already goes to learn about history is another.

If it's not Googleable, it doesn't exist.

[slide – screenshot of unsuccessful search for Ina von Grumbkow]
Most content searches start and end online. The content and links available to search engines inform their assumptions about the world, and they in turn shape the world view presented on the results screen. If the name of a historical figure doesn't show up in Google, how else would someone find out about them? While college students might be heavy users of Google's specialist Google Scholar search, it's unlikely that people would come across it accidentally, not least because there's a 'semantic gap' between the language used in academia and the language used in everyday speech. Writing for Wikipedia means writing in everyday language, and the site is heavily indexed by search engines – it doesn't take long for content created on Wikipedia – even on a user's talk page and not the main site – to show up in Google results. So one reason to take history on Wikipedia seriously is that it affects what search engines know about the world.

'Did you mean… hegemony?'

Search for 'Viscountess Ranelagh', Google says 'Did you mean Viscount'. No. 

[slide – screenshot  of search for 'Viscountess Ranelagh and the Authorisation of Women's Knowledge in the Hartlib Circle', Google says 'Did you mean Viscount'. No.]
Scholarship and sources contained in specialist online archives and repositories are often off-limits to the Google bots that crawl the web looking for content to index. Because search engines normalise certain assumptions about the world, getting more content about women's histories in publically accessible spaces will eventually have an effect in the algorithms that determine suggestions for 'did you mean' etc. Contributions to sites like Wikipedia can eventually become contributions to the 'knowledge graphs' that determine the answers to questions we ask online.

If it's behind a paywall, it only exists for a privileged few

[Slide – Screenshot of blocked attempt to access 'Wives and daughters of early Berlin geoscientists and their work behind the scenes']
Specialist users will be able to find academic research via Google Scholar, but any independent scholars in attendance will be able to speak to the difficulties in gaining access to journal articles without membership of an institutional library. Journal articles obviously have a lot of value within academic communities, but the research they represent is only available to a privileged few.

Why does Wikipedia matter?

[slide: For some, Wikipedia is the font of all wisdom]
Wikipedia is one of the most visited websites in the world. As one commentator said, 'people turn to Wikipedia as an objective resource' but ' it's not so objective in many ways.'[6]

However, as the free online encyclopedia 'that anyone can edit', it also provides the ability to take direct action to fix the under-representation of women's history. President of the AHA, William Cronon said, 'Wikipedia provides an online home for people interested in histories long marginalized by the traditional academy'[7] – this may not be entirely true yet, but we can hope.

Wikipedia is not yet encyclopedic

[Slide – Ina screenshot]
The English version of Wikipedia has over 4 million articles but it still has some way to go to become truly encyclopedic. Martha Saxton has noted the absence of women's history content on Wikipedia and was distressed by 'its superficiality and inaccuracies when present [8]'. Just as female assistants, secretaries, collectors, illustrators, correspondents, translators, salonists, cataloguers, text book writers, popularisers, explorers, pioneers and colleagues have been left out of traditional academic histories and gradually reclaimed by historians, they are often still invisible on Wikipedia. This may be partly because not enough women edit Wikipedia – as Wikipedia User Gobonobo says, 'editors often contribute to topics they are familiar with and that concern them […] This systemic bias has the potential to exacerbate an historical record that already gives undue emphasis to men.' [9]

The under-representation of women's history undermines Wikipedia's claim to be encyclopedic. Issues include missing entries or omissions in coverage for existing topics, entries with inaccurate content, a failure to represent a truly 'neutral point of view', and a representation of 'male' as the default gender.

Many notable women have been buried in pages titled for their husbands, brothers, tutors, etc. In 1908 Ina von Grumbkow undertook an expedition to Iceland. She later made significant contributions to the field of natural history and wrote several books but other than passing references online and a mention on her husband's Wikipedia page, her story is only available to those with access to sources like the ' Earth Sciences History' journal[10][11].

[Slide: 'Main articles: List of Fellows of the Royal Society and List of female Fellows of the Royal Society '.]
Some of the categories used in Wikipedia posit the default gender as male. For example, there's a ' List of Fellows of the Royal Society ' and ' List of female Fellows of the Royal Society'.

Wikipedia and the challenges of digital history

Writing for Wikipedia encapsulates many, but not all, of the challenges of digital history.

New forms of writing

Writing for Wikipedia calls upon historians to write engaging, intellectually accessible, succinct text that still accurately represents its subject. It not only means valuing the work and skills in writing public history, it requires the ability to write history in public.

Writing for a 'neutral point of view' – one of the key values of Wikipedia – is challenging for historians. Many may find difficult to believe that it's even possible, and it's difficult to achieve [12].

Unlike traditional historical scholarship, characterised by 'possessive individualism' [13] and honed to perfection before publication, Wikipedia entries are considered a work in progress, and anyone who spots an issue is asked to fix it themselves or flag it for others to review.

It won't advance your career

While it might have a large public impact, editing Wikipedia is work that isn't credited in academia, and it takes time that could be used for projects that would count for career advancement. More importantly from Wikipedia's point of view, you can't promote your own work on the site, so writing about your own research interests is not straightforward if not many people have published in your area of expertise.

“On the internet, nobody knows you're a professor”

In a comment with 'pointers for academics who would like to contribute to Wikipedia' on a Chronicle article, commentator 'operalala' said, '"On the internet nobody knows you're a professor." If you're used to deferential treatment at your home institution, you'll be treated like everybody else in the Wide Open Internet.'[14] Or in William Cronon's words, you must 'give up the comfort of credentialed expertise'.[15] Anyone can edit, re-shape or even delete your work.

Just like academia, Wikipedia has ways of establishing the credibility and reputation of a contributor, and just like any other community, there are etiquettes and conventions to observe. As newcomers to the community, Claire Potter warns that it's important not to think of Wikipedia as 'another realm for intellectuals to colonize and professionalize'.[16]

The opportunities and challenges of women's history as public history on Wikipedia

Opportunities

#WomenSciWP editathon at the Royal Society

Wikipedia uses red links to represent entries that could be created but don't yet exist. Women's history editathons often create lists of red-linked names as suggested topics that could be created [17] . Projects on and outside Wikipedia, and events at institutions like the Smithsonian and Royal Society and just last weekend at three THATCamps across the United States might be part of a critical mass of people learning how to edit Wikipedia to better include women's history.

Compared to the lengthy process of writing for academic publication, a new Wikipedia entry can be created in a few hours, allowing for time to structure the content and format the references as necessary to pass the first quality bar. An existing entry can be corrected in minutes. Each editathon or personal edit improves the representation of women's history, and there's something very satisfying about turning red links blue.

Ina von Grumbkow's name red-linked on her husband's Wikipedia page

Adding the brackets that turn a piece of text into a red link, suggesting the possibility of an entry to be created is a small but potentially powerful intervention. Red links can render the gaps and silences visible.

Resistance

Creating or editing entries on women's history may be relatively easy, but making sure they stay there is less so. There are countless examples of women having to fight to keep changes in as other editors revert them, argue about their choice of sources, the significance or notability of their topic. Wikipedians are zealous in preventing spammers and crackpots polluting the quality of the site, which explains some of the rapid 'nominations for deletion', but some pockets of the site are also hostile to women's history or to women themselves.

Saxton said editing Wikipedia is 'not for the faint of heart' and 'a lesson in how little women's history has penetrated mainstream culture'. There's work to be done in sharing and normalising an understanding of the historical circumstances and cultural contexts that created difficulties for women. We might know that, as Janet Abbate said, 'The laws and social conventions of a given time and place strongly shape the kinds of technical training available to women and men, the career options open to them, their opportunities for advancement and recognition' [18] but until other Wikipedians understand that, there will continue to be issues around 'notability'. Having those conversations as many times as necessary might be tiring and uncomfortable or even controversial, but it's part of the work of representing women's history on Wikipedia.

Tensions

'Reliable sources'

Wikipedians may have different definitions of 'reliable sources' than scholarly researchers. As one academic discovered:
"Wikipedia is not 'truth,' Wikipedia is 'verifiability' of reliable sources. Hence, if most secondary sources which are taken as reliable happen to repeat a flawed account or description of something, Wikipedia will echo that."' [19]

The same gatekeepers matter

As some academics have found, 'Wikipedia differs from primary-source research, from scholarly writing, and how it privileges existing rather than new knowledge' [20] [21] Wikipedia is not the place to redress fundamental issues with silences in the archives or in the profession overall, not least because on Wikipedia, primary research is bad and secondary sources are good [22] . This puts the onus back on to traditional academic publishing in peer-reviewed journals and books that can be cited in Wikipedia articles, though other published works such as 'credible and authoritative books' and 'reputable media sources' can also be cited.

'Notability'

'A person is presumed to be notable if he or she has received significant coverage in reliable secondary sources that are independent of the subject. […] the person who is the topic of a biographical article should be "worthy of notice" – that is, "significant, interesting, or unusual enough to deserve attention or to be recorded" within Wikipedia as a written account of that person's life.' [23] 'The common theme in the notability guidelines is that there must be verifiable, objective evidence that the subject has received significant attention from independent sources to support a claim of notability.' [24] This creates obvious difficulties for some women's histories.

It's also difficult to judge where 'notability' should end. When does focusing on exceptional women become counter-productive? When do we risk creating a new canon? When does it stop being remarkable that a woman became prominent in a field and start being more accepted, if still not expected? [25] At what point should writing shift from individual entries to integration into more general topics?

Conclusion

Sometimes it's hard to tell whether Wikipedia lags behind academia's acceptance and general integration of women's history into mainstream history or whether it is representative of the field's more conservative corners. Recent digital history projects are doing a good job in explaining some of the issues with key sources for Wikipedia like the Oxford Dictionary of National Biography [26] , and I'd hope that this continues. As Martha Saxton said, 'integrating women's experience into broad subjects' is 'both more challenging intellectually and ultimately, more to the point of the overall project of bringing women into our acknowledged history'. [27]

But it's also clearly up to us to make a difference. If it's worth researching the life and achievements of a notable woman, it's worth making sure their contribution to history is available to the world while improving the quality of the world's biggest encyclopaedia. And it doesn't mean going it alone. It's still just Women's History Month so it's not too late to sign up and join one of the women's history projects, or to plan something with your students. [28] [29] [30]

I'd like to close with quotes from two different women. Executive Director of the Wikimedia Foundation, Sue Gardner: 'Wikipedia will only contain 'the sum of all human knowledge' if its editors are as diverse as the population itself: you can help make that happen. And I can't think of anything more important to do, than that.' [31]
 
And to quote Laura Mandell's keynote yesterday: 'Let's write and publish about each other's projects so that future historians will have those sources to write about. … Nothing changes through thinking alone, only through massive amounts of re-iteration'. [32]

[Update: based on questions afterwards, you may want to get started with Wikipedia:How to run an edit-a-thon, or sign up and say hello at Wikipedia:WikiProject Women's History. You could also join in  the Global Women Wikipedia Write-In #GWWI on April 26 (1-3pm, US EST), and they have a handy page on How to Create Wikipedia Entries that Will Stick.

And update April 30, 2013: check out 'Learning to work with Wikipedia – New Pages Patrol and how to create new Wikipedia articles that will stick' by the excellent Adrianne Wadewitz.

Update, June 9: if you're thinking of setting a class assignment involving editing Wikipedia, check out their 'For educators' and 'Assignment Design' pages for tips and contact points.  June 18: see also Nicole Beale's 'Wikipedia for Regional Museums'.

Update, August 21, 2013: content on Wikipedia appears to have had an additional boost in Google's search results, making it even more important in shaping the world's knowledge. More at 'The Day the Knowledge Graph Exploded'.

New link, February 2014: Jacqueline Wernimont's Notes for #tooFEW Edit a thon based on a training session by Adrianne Wadewitz are a useful basic introduction to editing.]


References

[1] Various. ‘Wikipedia’. 2013. Wikipedia. http://en.wikipedia.org/wiki/Wikipedia.
[5] Barnett, Fiona. 2013. ‘#tooFEW – Feminists Engage Wikipedia’. HASTAC. March 11. http://hastac.org/blogs/fionab/2013/03/11/toofew-feminists-engage-wikipedia.
[6] Gobry, Pascal-Emmanuel. 2011. ‘Wikipedia Is Hampered By Its Huge Gender Gap’. Business Insider. January 31. http://www.businessinsider.com/wikipedia-is-hampered-by-its-huge-gender-gap-2011-1#.
[7] Cronon, William. 2012. ‘Scholarly Authority in a Wikified World’. Perspectives on History, American Historical Association. February 7. http://www.historians.org/perspectives/issues/2012/1202/Scholarly-Authority-in-a-Wikified-World.cfm.
[8] Saxton, Martha. 2012. ‘Wikipedia and Women’s History: A Classroom Experience’. Writing History in the Digital Age. http://writinghistory.trincoll.edu/crowdsourcing/saxton-etal-2012-spring/.
[9] Gobonobo. 2013. ‘User:Gobonobo/Gender Gap Red List’. Wikipedia. https://en.wikipedia.org/wiki/User:Gobonobo/Gender_Gap_red_list
[10] Various.. ‘Hans Reck’. Wikipedia. https://en.wikipedia.org/wiki/Hans_Reck
[11] Mohr, B. A. R. 2010. Wives and daughters of early Berlin geoscientists and their work behind the scenes. Earth Sciences History 29 (2): 291–310.
[12] As commenter Operalala suggested, one challenge is recognising ‘the difference between the plurality of academia and the singularity of a Wikipedia article’. Comment http://chronicle.com/article/The-Undue-Weight-of-Truth-on/130704/#comment-437781354 on Messer-Kruse, Timothy. 2012. ‘The “Undue Weight” of Truth on Wikipedia’. The Chronicle of Higher Education. February 12. http://chronicle.com/article/The-Undue-Weight-of-Truth-on/130704/.
[13] Rosenzweig, Roy. 2006. ‘Can History Be Open Source? Wikipedia and the Future of the Past’. The Journal of American History 93 (1) (June): 117–46. https://chnm.gmu.edu/essays-on-history-new-media/essays/?essayid=42
[14] Operalala on Messer-Kruse, 2012 [15] Cronon, 2012.
[16] Potter, Claire. 2013. ‘Looking for the Women on Wikipedia: Readers Respond’. The Chronicle of Higher Education. March 14. http://chronicle.com/blognetwork/tenuredradical/2013/03/looking-for-the-women-on-wikipedia-readers-respond/
[18] Janet Abbate, "Guest Editor's Introduction: Women and Gender in the History of Computing," IEEE Annals of the History of Computing, vol. 25, no. 4, pp. 4-8, October-December, 2003
[19] Messer-Kruse, 2012.
[20] Anderson, Jill. 2013. ‘A Supposedly Fun Thing I’ll (Probably) Never Do Again’. True Stories Backward. http://girlhistorian.wordpress.com/2013/03/16/a-supposedly-fun-thing-ill-probably-never-do-again/
[21] Messer-Kruse, 2012.
[22] Various. 2013. ‘Wikipedia:No Original Research’. Wikipedia. https://en.wikipedia.org/wiki/Wikipedia:No_original_research
[23] Various. 2013. ‘Wikipedia:Notability (people)’. Wikipedia. http://en.wikipedia.org/wiki/Wikipedia:Notability_(people)
[24] Various. 2013. ‘Wikipedia:Notability’. Wikipedia. https://en.wikipedia.org/wiki/Wikipedia:NOTE
[25] Or as Christie Aschwanden says when proposing the 'Finkbeiner test' for contemporary journalism about women in science, 'treating female scientists as special cases only perpetuates the idea that there’s something extraordinary about a woman doing science'. Aschwanden, Christie. 2013. ‘The Finkbeiner Test’. Double X Science. March 5. http://www.doublexscience.org/the-finkbeiner-test/
[26] For a recent example, see ‘An Entry of One’s Own, or Why Are There So Few Women In the Early Modern Social Network?’ 2013. Six Degrees of Francis Bacon. March 8. http://sixdegreesoffrancisbacon.com/post/44879380376/an-entry-of-ones-own-or-why-are-there-so-few-women-in and ‘Gender and Name Recognition’. 2013. Six Degrees of Francis Bacon. March 20. http://sixdegreesoffrancisbacon.com/post/45833622936/gender-and-name-recognition
[27] Saxton, 2012
[29] Potter, Claire. 2013. ‘Prikipedia? Or, Looking for the Women on Wikipedia’. The Chronicle of Higher Education. March 10. http://chronicle.com/blognetwork/tenuredradical/2013/03/prikipedia-looking-for-the-women-on-wikipedia/
[30] For advice, see: Wikimedia Outreach. 2013. ‘Education Portal/Tips and Resources’. Wikipedia Outreach Wiki.  http://outreach.wikimedia.org/wiki/Education_Portal/Tips_and_Resources
[31] A comment on Gardner, Sue. 2010. ‘Unlocking the Clubhouse: Five Ways to Encourage Women to Edit Wikipedia’. Sue Gardner’s Blog. November 14. http://suegardner.org/2010/11/14/unlocking-the-clubhouse-five-ways-to-encourage-women-to-edit-wikipedia/
[32] Mandell, Laura. 2013. "Feminist Critique vs. Feminist Production in Digital Humanities." Keynote presented at the Women’s History in the Digital World conference, Bryn Mawr College, Pennsylvania March 22 2013

Finding museum, digital humanities and public history projects and communities online

Every once in a while I see someone asking for sources on digital, participatory, social media projects around museums, public history, social history, etc but I don't always have a moment to reply.  To make it easier to help people, here's a quick collection of good places to get started.

I think the best source for museums and digital/social media projects is the site and community around the Museums and the Web conference, including 'Best of the Web' nominations and awards (2012-1997)  and conference proceedings: 201220112010-1987.

Other projects might be listed at the new Digital Humanities Awards (nominations closed on the 11th so presumably they'll publish the list of nominees soon) or the (US) National Council on Public History Awards. The Digital Humanities conferences also include some social history, public history and participatory projects e.g. DH2012, as did the first Digital Humanities Australasia conference and the MCG's UK Museums on the Web conference reports.

To start finding online communities, look for people tweeting with #dhist, #digitalhumanities, #lodlam, #drinkingaboutmuseums, #musetech (and variations) or join the Museums Computer Group or the Museum Computer Network lists (or check their archives).

I'd like to add a list of museum bloggers (whether they focus on social media, technology, education, exhibition design, audience research, etc) but don't know of any comprehensive, up-to-date lists (or delicious etc tags).  (Though since I originally posted @gretchjenn pointed me to the new 'Meet a museum blogger' series and @alexandrematos told me about Cultural blogging in Europe which includes a map of the European cultural blogging scene.) Where do you look for museum bloggers?

This is only a start, so please chip in!  Add any resources I'm missing in the comments below, or tweet @mia_out.

The ever-morphing PhD

I wrote this for the NEH/Polis Summer Institute on deep mapping back in June but I'm repurposing it as a quick PhD update as I review my call for interview participants. I'm in the middle of interviews at the moment (and if you're an academic historian working on British history 1600-1900 who might be willing to be interviewed I'd love to hear from you) and after that I'll no doubt be taking stock of the research landscape, the findings from my interviews and project analyses, and updating the shape of my project as we go into the new year. So it doesn't quite reflect where I'm at now, but at the very least it's an insight into the difficulties of research into digital history methodologies when everything is changing so quickly:

"Originally I was going to build a tool to support something like crowdsourced deep mapping through a web application that would let people store and geolocate documents and images they were digitising. The questions that are particularly relevant for this workshop are: what happens when crowdsourcing or citizen history meet deep mapping? Can a deep map created by multiple people for their own research purposes support scholarly work? Can a synthetic, ad hoc collection of information be used to support an argument or would it be just for the discovery of spatio-temporarily relevant material? How would a spatial narrative layer work?

I planned to test this by mapping the lives and intellectual networks of early scientific women. But after conducting a big review of related projects I eventually realised that there's too much similar work going on in the field and that inevitably something similar would have been created by someone with more resources by the time I was writing up. So I had to rethink my question and my methods.

So now my PhD research seeks to answer 'how do academic and family/local historians evaluate, use and contribute to crowdsourced resources, especially geo-located historical materials?', with the goal of providing some insight into the impact of digitality on research practices and scholarship in the humanities. … How do trained and self-taught historians cope with changes in place names and boundaries over time, and the many variations and similarities in place names. Does it matter if you've never been to the place and don't know that it might be that messy and complex?

I'm interested how living in a digital culture affects how researchers work. What does it mean to generate as well as consume digital data in the course of research? How does user-created content affect questions of authorship, authority and trust for amateur historians and scholarly practice? What are the characteristics of a well-designed digital resource, and how can resources and tools for researchers be improved? It's a very Human-Computer Interaction/Infomatics view of the digital humanities but it addresses the issues around discoverability and usability that are so important for people building projects.

I'm currently interviewing academic, family and local historians, focusing on those working on research on people or places in early modern England – very loosely defined, as I'll go 1600-1900. I'm asking them about the tools do they currently use in their research; how they assess new resources; if or when they might you use a resource created through crowdsourcing or user contributions? (e.g. Wikipedia or ancestry.com); how do you work out which online records to trust? How they use place names or geographic locations in your research?

So far I've mostly analysed the interviews for how people think about crowdsourcing, I'll be focusing on the responses to place when I get back.

More generally, I'm interested in the idea of 'chorography 2.0' – what would it look like now? The abundance of information is as much of a problem as an opportunity: how to manage that?"

Museums, Libraries, Archives and the Digital Humanities – get involved!

The short version: if you've got ideas on how museums, libraries and archives (i.e. GLAM) and the digital humanities can inspire and learn from each other, it's your lucky day! Go add your ideas about concrete actions the Association for Computers and the Humanities can take to bring the two communities together or suggestions for a top ten 'get started in museums and the digital humanities' list (whether conference papers, journal articles, blogs or blog posts, videos, etc) to: 'GLAM and Digital Humanities together FTW'.

Update, August 23, 2012: the document is shaping up to be largely about ‘what can be done’ – which issues are shared by GLAMs and DH, how can we reach people in each field, what kinds of activities and conversations would be beneficial, how do we explain the core concepts and benefits of each field to the other? This suggests there’d be a useful second stage in focusing on filling in the detail around each of the issues and ideas raised in this initial creative phase. In the meantime, keep adding suggestions and sharing issues at the intersection of digital humanities and memory institutions.

A note on nomenclature: the genesis of this particular conversation was among museumy people so the original title of the document reflects that; it also reflects the desire to be practical and start with a field we knew well. The acronym GLAM (galleries, libraries, archives and museums) neatly covers the field of cultural heritage and the arts, but I'm never quite sure how effective it is as a recognisable call-to-action.  There's also a lot we could learn from the field of public history, so if that's you, consider yourself invited to the party!

The longer version: in an earlier post from July's Digital Humanities conference in Hamburg I mentioned that a conversation over twitter about museums and digital humanities lead to a lunch with @ericdmj, @clairey_ross, @briancroxall, @amyeetx where we discussed simple ways to help digital humanists get a sense of what can be learnt from museums on topics like digital projects, audience outreach, education and public participation. It turns out the Digital Humanities community is also interested in working more closely with museums, as demonstrated by the votes for point 3 of the Association for Computers and the Humanities (ACH)'s 'Next Steps' document, "to explore relationships w/ DH-sympathetic orgs operating beyond the academy (Museum Computer Network, Nat'l Council on Public History, etc)". At the request of ACH's Bethany Nowviskie (@nowviskie) and Stéfan Sinclair (@sgsinclair), Eric D. M. Johnson and I had been tossing around some ideas for concrete next steps and working up to asking people working at the intersection of GLAM and DH for their input.

However, last night a conversation on twitter about DH and museums (prompted by Miriam Posner's tweet asking for input on a post 'What are some challenges to doing DH in the library?') suddenly took off so I seized the moment by throwing the outline of the document Eric and I had been tinkering with onto Google docs. It was getting late in the UK so I tweeted the link and left it so anyone could edit it. I came back the next morning to find lots of useful and interesting comments and additions and a whole list of people who are interested in continuing the conversation.  Even better, people have continued to add to it today and it's already a good resource.  If you weren't online at that particular time it's easy to miss it, so this post is partly to act as a more findable marker for the conversation about museums, libraries, archives and the digital humanities.

Explaining the digital humanities to GLAMs

This definition was added to the document overnight.  If you're a GLAM person, does it resonate with you or does it need tweaking?

"The broadest definition would be 1) using digital technologies to answer humanities research questions, 2) studying born digital objects as a humanist would have studied physical objects, and or 3) using digital tools to transform what scholarship is by making it more accessible on the open web."

How can you get involved?

Off the top of my head…

  • Add your name to the list of people interested in keeping up with the conversation
  • Read through the suggestions already posted; if you love an idea that's already there, say so!
  • Read and share the links already added to the document
  • Suggest specific events where GLAM and DH people can mingle and share ideas/presentations
  • Suggest specific events where a small travel bursary might help get conversations started
  • Offer to present on GLAMs and DH at an event
  • Add examples of digital projects that bridge the various worlds
  • Add examples of issues that bridge the various worlds
  • Write case studies that address some of the issues shared by GLAMs and DH
  • Spread the word via specialist mailing lists or personal contacts
  • Share links to conference papers, journal articles, videos, podcasts, books, blog posts, etc, that summarise some of the best ideas in ways that will resonate with other fields
  • Consider attending or starting something like Decoding Digital Humanities to discuss issues in DH. (If you're in or near Oxford and want to help me get one started, let me know!)
  • Something else I haven't thought of…

I'm super-excited about this because everyone wins when we have better links between museums and digital humanities. Personally, I've spent a decade working in various museums (and their associated libraries and archives) and my PhD is in Digital Humanities (or more realistically, Digital History), and my inner geek itches to find an efficient solution when I see each field asking some of the same questions, or asking questions the other field has been working to answer for a while.  This conversation has already started to help me discover useful synergies between GLAMs and DH, so I hope it helps you too.

Update, November 2012: as a result of discussions around this document/topic, the Museums Computer Group (MCG) and the Association for Computers and the Humanities (ACH) worked together to create 5 bursaries from the ACH for tickets to the MCG's UK Museums on the Web conference.

Catch the wind? (Re-post from Polis blog on Spatial Narratives and Deep Maps)

[This post was originally written for the Polis Center's blog.]

Our time at the NEH Institute on Spatial Narratives & Deep Maps is almost at an end.  The past fortnight feels both like it’s flown by and like we’ve been here for ages, which is possibly the right state of mind for thinking about deep maps.  After two weeks of debate deep maps still seem definable only when glimpsed in the periphery and yet not-quite defined when examined directly.  How can we capture the almost-tangible shape of a truly deep map that we can only glimpse through the social constructs, the particular contexts of creation and usage, discipline and the models in current technology?  If deep maps are an attempt to get beyond the use of location-as-index and into space-as-experience, can that currently be done more effectively on a screen or does covering a desk in maps and documents actually allow deeper immersion in a space at a particular time?

We’ve spent the past three days working in teams to prototype different interfaces to deep maps or spatial narratives, and each group presented their interfaces today. It’s been immensely fun and productive and also quite difficult at times.  It’s helped me realise that deep maps and spatial narratives are not dichotomous but exist on a scale – where do you draw the line between curating data sources and presenting an interpreted view of them?  At present, a deep map cannot be a recreation of the world, but it can be a platform for immersive thinking about the intersection of space, time and human lives.  At what point do you move from using a deep map to construct a spatial and temporal argument to using a spatial narrative to present it?

The experience of our (the Broadway team) reinforces Stuart’s point about the importance of the case study.  We uncovered foundational questions whilst deep in the process of constructing interfaces: is a deep map a space for personal exploration, comparison and analysis of sources, or is it a shared vision that is personalised through the process of creating a spatial narrative?  We also attempted to think through how multivocality translates into something on a screen, and how interfaces that can link one article or concept to multiple places might work in reality, and in the process re-discovered that each scholar may have different working methods, but that a clever interface can support multivocality in functionality as well as in content.

Halfway through 'deep maps and spatial narratives' summer institute

I'm a week and a bit into the NEH Institute for Advanced Topics in the Digital Humanities on 'Spatial Narrative and Deep Maps: Explorations in the Spatial Humanities', so this is a (possibly self-indulgent) post to explain why I'm over in Indianapolis and why I only seem to be tweeting with the #PolisNEH hashtag.  We're about to dive into three days of intense prototyping before wrapping things up on Friday, so I'm posting almost as a marker of my thoughts before the process of thinking-through-making makes me re-evaluate our earlier definitions.  Stuart Dunn has also blogged more usefully on Deep maps in Indy.

We spent the first week hearing from the co-directors David Bodenhamer (history, IUPUI), John Corrigan (religious studies, Florida State University), and Trevor Harris (geography, West Virginia University) and guest lecturers Ian Gregory (historical GIS and digital humanities, Lancaster University) and May Yuan (geonarratives, University of Oklahoma), and also from selected speakers at the Digital Cultural Mapping: Transformative Scholarship and Teaching in the Geospatial Humanities at UCLA. We also heard about the other participants projects and backgrounds, and tried to define 'deep maps' and 'spatial narratives'.

It's been pointed out that as we're at the 'bleeding edge', visions for deep mapping are still highly personal. As we don't yet have a shared definition I don't want to misrepresent people's ideas by summarising them, so I'm just posting my current definition of deep maps:

A deep map contains geolocated information from multiple sources that convey their source, contingency and context of creation; it is both integrated and queryable through indexes of time and space.  

Essential characteristics: it can be a product, whether as a snapshot static map or as layers of interpretation with signposts and pre-set interactions and narrative, but is always visibly a process.  It allows open-ended exploration (within the limitations of the data available and the curation processes and research questions behind it) and supports serendipitous discovery of content. It supports curiosity. It supports arguments but allows them to be interrogated through the mapped content. It supports layers of spatial narratives but does not require them. It should be compatible with humanities work: it's citable (e.g. provides URL that shows view used to construct argument) and provides access to its sources, whether as data downloads or citations. It can include different map layers (e.g. historic maps) as well as different data sources. It could be topological as well as cartographic.  It must be usable at different scales:  e.g. in user interface  – when zoomed out provides sense of density of information within; e.g. as space – can deal with different levels of granularity.

Essential functions: it must be queryable and browseable.  It must support large, variable, complex, messy, fuzzy, multi-scalar data. It should be able to include entities such as real and imaginary people and events as well as places within spaces.  It should support both use for presentation of content and analytic use. It should be compelling – people should want to explore other places, times, relationships or sources. It should be intellectually immersive and support 'flow'.

Looking at it now, the first part is probably pretty close to how I would have defined it at the start, but my thinking about what this actually means in terms of specifications is the result of the conversations over the past week and the experience everyone brings from their own research and projects.

For me, this Institute has been a chance to hang out with ace people with similar interests and different backgrounds – it might mean we spend some time trying to negotiate discipline-specific language but it also makes for a richer experience.  It's a chance to work with wonderfully messy humanities data, and to work out how digital tools and interfaces can support ambiguous, subjective, uncertain, imprecise, rich, experiential content alongside the highly structured data GIS systems are good at.  It's also a chance to test these ideas by putting them into practice with a dataset on religion in Indianapolis and learn more about deep maps by trying to build one (albeit in three days).

As part of thinking about what I think a deep map is, I found myself going back to an embarrassingly dated post on ideas for location-linked cultural heritage projects:

I've always been fascinated with the idea of making the invisible and intangible layers of history linked to any one location visible again. Millions of lives, ordinary or notable, have been lived in London (and in your city); imagine waiting at your local bus stop and having access to the countless stories and events that happened around you over the centuries. … The nice thing about local data is that there are lots of people making content; the not nice thing about local data is that it's scattered all over the web, in all kinds of formats with all kinds of 'trustability', from museums/libraries/archives, to local councils to local enthusiasts and the occasional raving lunatic. … Location-linked data isn't only about official cultural heritage data; it could be used to display, preserve and commemorate histories that aren't 'notable' or 'historic' enough for recording officially, whether that's grime pirate radio stations in East London high-rise roofs or the sites of Turkish social clubs that are now new apartment buildings. Museums might not generate that data, but we could look at how it fits with user-generated content and with our collecting policies.

Amusingly, four years ago my obsession with 'open sourcing history' was apparently already well-developed and I was asking questions about authority and trust that eventually informed my PhD – questions I hope we can start to answer as we try to make a deep map.  Fun!

Finally, my thanks to the NEH and the Institute organisers and the support staff at the Polis Center and IUPUI for the opportunity to attend.

Slow and still dirty Digital Humanities Australasia notes: day 3

These are my very rough notes from day 3 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Quick and dirty Digital Humanities Australasia notes: day 2) held in Canberra's Australian National University at the end of March.

We were welcomed to Day 3 by the ANU's Professor Marnie Hughes-Warrington (who expressed her gratitude for the methodological and social impact of digital humanities work) and Dr Katherine Bode.  The keynote was Dr Julia Flanders on 'Rethinking Collections', AKA 'in praise of collections'… [See also Axel Brun's live blog.]

She started by asking what we mean by a 'collection'? What's the utility of the term? What's the cultural significance of collections? The term speaks of agency, motive, and implies the existence of a collector who creates order through selectivity. Sites like eBay, Flickr, Pinterest are responding to weirdly deep-seated desire to reassert the ways in which things belong together. The term 'collection' implies that a certain kind of completeness may be achieved. Each item is important in itself and also in relation to other items in the collection.

There's a suite of expected activities and interactions in the genre of digital collections, projects, etc. They're deliberate aggregations of materials that bear, demand individual scrutiny. Attention is given to the value of scale (and distant reading) which reinforces the aggregate approach…

She discussed the value of deliberate scope, deliberate shaping of collections, not craving 'everythingness'. There might also be algorithmically gathered collections…

She discussed collections she has to do with – TAPAS, DHQ, Women Writers Online – all using flavours of TEI, the same publishing logic, component stack, providing the same functionality in the service of the same kinds of activities, though they work with different materials for different purposes.

What constitutes a collection? How are curated collections different to user-generated content or just-in-time collections? Back 'then', collections were things you wanted in your house or wanted to see in the same visit. What does the 'now' of collections look like? Decentralisation in collections 'now'… technical requirements are part of the intellectual landscape, part of larger activities of editing and design. A crucial characteristic of collections is variety of philosophical urgency they respond to.

The electronic operates under the sign of limitless storage… potentially boundless inclusiveness. Design logic is a craving for elucidation, more context, the ability for the reader to follow any line of thought they might be having and follow it to the end. Unlimited informational desire, closing in of intellectual constraints. How do boundedness and internal cohesion help define the purpose of a collection? Deliberate attempt at genre not limited by technical limitations. Boundedness helps define and reflect philosophical purpose.

What do we model when we design and build digital collections? We're modelling the agency through which the collection comes into being and is sustained through usage. Design is a collection of representational practices, item selection, item boundaries and contents. There's a homogeneity in the structure, the markup applied to items. Item-to-item interconnections – there's the collection-level 'explicit phenomena' – the directly comparable metadata through which we establish cross-sectional views through the collection (eg by Dublin Core fields) which reveal things we already know about texts – authorship of an item, etc. There's also collection-level 'implicit phenomena' – informational commonalities, patterns that emerge or are revealed through inspection; change shape imperceptibly through how data is modelled or through software used [not sure I got that down right]; they're always motivated so always have a close connection with method.

Readerly knowledge – what can the collection assume about what the reader knows? A table of contents is only useful if you can recognise the thing you want to find in it – they're not always self-evident. How does the collection's modelling affect us as readers? Consider the effects of choices on the intellectual ecology of the collection, including its readers. Readerly knowledge has everything to do with what we think we're doing in digital humanities research.

The Hermeneutics of Screwing Around (pdf). Searching produces a dynamically located just-in-time collection… Search is an annoying guessing game with a passive-aggressive collection. But we prefer to ask a collection to show its hand in a useful way (i. e. browse)… Search -> browse -> explore.

What's the cultural significance of collections? She referenced Liu's Sidney's Technology… A network as flow of information via connection, perpetually ongoing contextualisation; a patchwork is understood as an assemblage, it implies a suturing together of things previously unrelated. A patchwork asserts connections by brute force. A network assumes that connections are there to be discovered, connected to. Patchwork, mosaic – connects pre-existing nodes that are acknowledged to be incommensurable.

We avow the desirability of the network, yet we're aware of the itch of edge cases, data that can't be brought under rule. What do we treat as noise and what as signal, what do we deny is the meaning of the collection? Is exceptionality or conformance to type the most significant case? On twitter, @aylewis summarised this as 'Patchworking metaphor lets us conceptualise non-conformance as signal not noise'

Pay attention to the friction in the system, rather than smoothing it over. Collections both express and support analysis. Expressing theories of genre etc in internal modelling… Patchwork – the collection articulates the scholarly interest that animated its creation but also interests of the reader… The collection is animated by agency, is modelled by it, even while it respects the agency we bring as readers. Scholarly enquiry is always a transaction involving agency on both ends.

My (not very good) notes from discussion afterwards… there was a question about digital femmage; discussion of the tension between the desire for transparency and the desire to permit many viewpoints on material while not disingenuously disavowing the roles in shaping the collection; the trend at one point for factoids rather than narratives (but people wanted the editors' view as a foundation for what they do with that material); the logic of the network – a collection as a set of parameters not as a set of items; Alan Liu's encouragement to continue with theme of human agency in understanding what collections are about (e.g. solo collectors like John Soane); crowdsourced work is important in itself regardless of whether it comes up with the 'best' outcome, by whatever metric. Flanders: 'the commitment to efficiency is worrisome to me, it puts product over people in our scale of moral assessment' [hoorah! IMO, engagement is as important as data in cultural heritage]; a question about the agency of objects, with the answer that digital surrogates are carriers of agency, the question is how to understand that in relation to object agency?

GIS and Mapping I

The first paper was 'Mapping the Past in the Present' by Andrew Wilson, which was a fast run-through some lovely examples based on Sydney's geo-spatial history. He discussed the spatial turn in history, and the mid-20thC shift to broader scales, territories of shared experience, the on-going concern with the description of space, its experience and management.

He referenced Deconstructing the map, Harley, 1989, 'cartography is seldom what the cartographers say it is'. All maps are lies. All maps have to be read, closely or distantly. He referenced Grace Karskens' On the rocks and discussed the reality of maps as evidence, an expression of European expansion; the creation of the maps is an exercise in power. Maps must be interpreted as evidence. He talked about deriving data from historic maps, using regressive analysis to go back in time through the sources. He also mentioned TGIS – time-enabled GIS. Space-time composite model – when have lots and lots of temporal changes, create polygon that describes every change in the sequence.

The second paper was 'Reading the Text, Walking the Terrain, Following the Map: Do We See the Same Landscape?' by Øyvind Eide. He said that viewing a document and seeing a landscape are often represented as similar activities… but seeing a landscape means moving around in it, being an active participant. Wood (2010) on the explosion of maps around 1500 – part of the development of the modern state. We look at older maps through modern eyes – maps weren't made for navigation but to establish the modern state.

He's done a case study on text v maps in Scandinavia, 1740s. What is lost in the process of converting text to maps? Context, vagueness, under-specification, negation, disjunction… It's a combination of too little and too much. Text has information that can't fit on a map and text that doesn't provide enough information to make a map. Under-specification is when a verbal text describes a spatial phenomenon in a way that can be understood in two different ways by a competent reader. How do you map a negative feature of a landscape? i.e. things that are stated not to be there. 'Or' cannot be expressed on a map… Different media, different experiences – each can mediate only certain aspects for total reality (Ellestrom 2010).

The third paper was 'Putting Harlem on the Map' by Stephen Robertson. This article on 'Writing History in the Digital Age' is probably a good reference point: Putting Harlem on the Map, the site is at Digital Harlem. The project sources were police files, newspapers, organisational archives… They were cultural historians, focussed on individual level data, events, what it was like to live in Harlem. It was one of first sites to employ geo-spatial web rather than GIS software. Information was extracted and summarised from primary sources, [but] it wasn't a digitisation project. They presented their own maps and analysis apart from the site to keep it clear for other people to do their work.  After assigning a geo-location it is then possible to compare it with other phenomena from the same space. They used sources that historians typically treat as ephemera such as society or sports pages as well as the news in newspapers.

He showed a great list of event types they've gotten from the data… Legal categories disaggregate crime so it appears more often in the list though was the minority of data. Location types also offers a picture of the community.

Creating visualisations of life in the neighbourhood…. when mapping at this detailed scale they were confronted with how vague most historical sources are and how they're related to other places. 'Historians are satisfied in most cases to say that a place is 'somewhere in Harlem'.' He talked about visualisations as 'asking, but not explaining, why there?'.

I tweeted that I'd gotten a lot more from his demonstration of the site than I had from looking at it unaided in the past, which lead to a discussion with @claudinec and @wragge about whether the 'search vs browse' accessibility issue applies to geospatial interfaces as well as text or images (i.e. what do you need to provide on the first screen to help people get into your data project) and about the need for as many hooks into interfaces as possible, including narratives as interfaces.

Crowdsourcing was raised during the questions at the end of the session, but I've forgotten who I was quoting when I tweeted, 'by marginalising crowdsourcing you're marginalising voices', on the other hand, 'memories are complicated'.  I added my own point of view, 'I think of crowdsourcing as open source history, sometimes that's living memory, sometimes it's research or digitisation'.  If anything, the conference confirmed my view that crowdsourcing in cultural heritage generally involves participating in the same processes as GLAM staff and humanists, and that it shouldn't be exploitative or rely on user experience tricks to get participants (though having made crowdsourcing games for museums, I obviously don't have a problem with making the process easier to participate in).

The final paper I saw was Paul Vetch, 'Beyond the Lowest Common Denominator: Designing Effective Digital Resources'. He discussed the design tensions between: users, audiences (and 'production values'); ubiquity and trends; experimentation (and failure); sustainability (and 'the deliverable'),

In the past digital humanities has compartmentalised groups of users in a way that's convenient but not necessarily valid. But funding pressure to serve wider audiences means anticipating lots of different needs. He said people make value judgements about the quality of a resource according to how it looks.

Ubiquity and trends: understanding what users already use; designing for intuition. Established heuristics for web design turn out to be completely at odds with how users behave.

Funding bodies expect deliverables, this conditions the way they design. It's difficult to combine: experimentation and high production values [something I've posted on before, but as Vetch said, people make value judgements about the quality of a resource according to how it looks so some polish is needed]; experimentation and sustainability…

Who are you designing for? Not the academic you're collaborating with, and it's not to create something that you as a developer would use. They're moving away from user testing at the end of a project to doing it during the project. [Hoorah!]

Ubiquity and trends – challenges include a very highly mediated environment; highly volatile and experimental… Trying to use established user conventions becomes stifling. (He called useit.com 'old nonsense'!) The ludic and experiential are increasingly important elements in how we present our research back.

Mapping Medieval Chester took technology designed for delivering contextual ads and used it to deliver information in context without changing perspective (i.e. without reloading the page, from memory).  The Gough map was an experiment in delivering a large image but also in making people smile.  Experimentation and failure… Online Chopin Variorum Edition was an experiment. How is the 'work' concept challenged by the Chopin sources? Technical methodological/objectives: superimposition; juxtaposition; collation/interpolation…

He discussed coping strategies for the Digital Humanities: accept and embrace the ephemerality of web-based interfaces; focus on process and experience – the underlying content is persistent even if the interfaces don't last.  I think this was a comment from the audience: 'if a digital resource doesn't last then it breaks the principle of citation – where does that leave scholarship?'

Summary

So those are my notes.  For further reference I've put a CSV archive of #DHA2012 tweets from searchhash.com here, but note it's not on Australian time so it needs transposing to match the session times.

This was my first proper big Digital Humanities conference, and I had a great time.  It probably helped that I'm an Australian expat so I knew a sprinkling of people and had a sense of where various institutions fitted in, but the crowd was also generally approachable and friendly.

I was also struck by the repetition of phrases like 'the digital deluge', the 'tsunami of data' – I had the feeling there's a barely managed anxiety about coping with all this data. And if that's how people at a digital humanities conference felt, how must less-digital humanists feel?

I was pleasantly surprised by how much digital history content there was, and even more pleasantly surprised by how many GLAMy people were there, and consequently how much the experience and role of museums, libraries and archives was reflected in the conversations.  This might not have been as obvious if you weren't on twitter – there was a bigger disconnect between the back channel and conversations in the room than I'm used to at museum conferences.

As I mentioned in my day 1 and day 2 posts, I was struck by the statement that 'history is on a different evolutionary branch of digital humanities to literary studies', partly because even though I started my PhD just over a year ago, I've felt the title will be outdated within a few years of graduation.  I can see myself being more comfortable describing my work as 'digital history' in future.

I have to finish by thanking all the speakers, the programme committee, and in particular, Dr Paul Arthur and Dr Katherine Bode, the organisers and the aaDH committee – the whole event went so smoothly you'd never know it was the first one!

And just because I loved this quote, one final tweet from @mikejonesmelb: Sir Ken Robinson: 'Technology is not technology if it was invented before you were born'.

'…and they all turn on their computers and say 'yay!" (aka, 'mapping for humanists')

I'm spending a few hours of my Sunday experimenting with 'mapping for humanists' with an art historian friend, Hannah Williams (@_hannahwill).  We're going to have a go at solving some issues she has encountered when geo-coding addresses in 17th and 18th Century Paris, and we'll post as we go to record the process and hopefully share some useful reflections on what we found as we tried different tools.

We started by working out what issues we wanted to address.  After some discussion we boiled it down to two basic goals: a) to geo-reference historical maps so they can be used to geo-locate addresses and b) to generate maps dynamically from list of addresses. This also means dealing with copyright and licensing issues along the way and thinking about how geospatial tools might fit into the everyday working practices of a historian.  (i.e. while a tool like Google Refine can generate easily generate maps, is it usable for people who are more comfortable with Word than relying on cloud-based services like Google Docs?  And if copyright is a concern, is it as easy to put points on an OpenStreetMap as on a Google Map?)

Like many historians, Hannah's use of maps fell into two main areas: maps as illustrations, and maps as analytic tools.  Maps used for illustrations (e.g. in publications) are ideally copyright-free, or can at least be used as illustrative screenshots.  Interactivity is a lower priority for now as the dataset would be private until the scholarly publication is complete (owing to concerns about the lack of an established etiquette and format for citation and credit for online projects).

Maps used for analysis would ideally support layers of geo-referenced historic maps on top of modern map services, allowing historic addresses to be visually located via contemporaneous maps and geo-located via the link to the modern map.  Hannah has been experimenting with finding location data via old maps of Paris in Hypercities, but manually locating 18th Century streets on historic maps then matching those locations to modern maps is time-consuming and she suspects there are more efficient ways to map old addresses onto modern Paris.

Based on my research interviews with historians and my own experience as a programmer, I'd also like to help humanists generate maps directly from structured data (and ideally to store their data in user-friendly tools so that it's as easy to re-use as it is to create and edit).  I'm not sure if it's possible to do this from existing tools or whether they'd always need an export step, so one of my questions is whether there are easy ways to get records stored in something like Word or Excel into an online tool and create maps from there.  Some other issues historians face in using mapping include: imprecise locations (e.g. street names without house numbers); potential changes in street layouts between historic and modern maps; incomplete datasets; using markers to visually differentiate types of information on maps; and retaining descriptive location data and other contextual information.

Because the challenge is to help the average humanist, I've assumed we should stay away from software that needs to be installed on a server, so to start with we're trying some of the web-based geo-referencing tools listed at http://help.oldmapsonline.org/georeference.

Geo-referencing tools for non-technical people

The first bump in the road was finding maps that are re-usable in technical and licensing terms so that we could link or upload them to the web tools listed at http://help.oldmapsonline.org/georeference.  We've fudged it for now by using a screenshot to try out the tools, but it's not exactly a sustainable solution.  
Hannah's been trying georeferencer.org, Hypercities and Heurist (thanks to Lise Summers ‏@morethangrass on twitter) and has written up her findings at Hacking Historical Maps… or trying to.  Thanks also to Alex Butterworth @AlxButterworth and Joseph Reeves @iknowjoseph for suggestions during the day.

Yahoo! Mapmixer's page was a 404 – I couldn't find any reference to the service being closed, but I also couldn't find a current link for it.

Next I tried Metacarter Labs' Map Rectifier.  Any maps uploaded to this service are publicly visible, though the site says this does 'not grant a copyright license to other users', '[t]here is no expectation of privacy or protection of data', which may be a concern for academics negotiating the line between openness and protecting work-in-progress or anyone dealing with sensitive data.  Many of the historians I've interviewed for my PhD research feel that some sense of control over who can view and use their data is important, though the reasons why and how this is manifested vary.

Screenshot from http://labs.metacarta.com/rectifier/rectify/7192


The site has clear instructions – 'double click on the source map… Double click on the right side to associate that point with the reference map' but the search within the right-hand side 'source map' didn't work and manually navigating to Paris, then the right section of Paris was a huge pain.  Neither of the base maps seemed to have labels, so finding the right location at the right level of zoom was too hard and eventually I gave up.  Maybe the service isn't meant to deal with that level of zoom?  We were using a very small section of map for our trials.

Inspired by Metacarta's Map Rectifier, Map Warper was written with OpenStreetMap in mind, which immediately helps us get closer to the goal of images usable in publications.  Map Warper is also used by the New York Public Library, which described it as a 'tool for digitally aligning ("rectifying") historical maps … to match today's precise maps'.  Map Warper also makes all uploaded maps public: 'By uploading images to the website, you agree that you have permission to do so, and accept that anyone else can potentially view and use them, including changing control points', but also offers 'Map visibility' options 'Public(default)' and 'Don't list the map (only you can see it)'.

Screenshot showing 'warped' historical map overlaid on OpenStreetMap at http://mapwarper.net/

Once a map is uploaded, it zooms to a 'best guess' location, presumably based on the information you provided when uploading the image.  It's a powerful tool, though I suspect it works better with larger images with more room for error.  Some of the functionality is a little obscure to the casual user – for example, the 'Rectify' view tells me '[t]his map either is not currently masked. Do you want to add or edit a mask now?' without explaining what a mask is.  However, I can live with some roughness around the edges because once you've warped your map (i.e. aligned it with a modern map), there's a handy link on the Export tab, 'View KML in Google Maps' that takes you to your map overlaid on a modern map.  Success!

Sadly not all the export options seem to be complete (they weren't working on my map, anyway) so I couldn't work out if there was a non-geek friendly way to open the map in OpenStreetMap.

We have to stop here for now, but at this point we've met one of the goals – to geo-reference historical maps so locations from the past can be found in the present, but the other will have to wait for another day.  (But I'd probably start with openheatmap.com when we tackle it again.  Any other suggestions would be gratefully received!)

(The title quote is something I heard one non-geek friend say to another to explain what geeks get up to at hackdays. We called our experiment a 'hackday' because we were curious to see whether the format of a hackday – working to meet a challenge within set parameters within a short period of time – would work for other types of projects. While this ended up being almost an 'anti-hack', because I didn't want to write code unless we came across a need for a generic tool, the format worked quite well for getting us to concentrate solidly on a small set of problems for an afternoon.)

Quick and dirty Digital Humanities Australasia notes: day 2

What better way to fill in stopover time in Abu Dhabi than continuing to post my notes from DHA2012? [Though I finished off the post and re-posted once I was back home.] These are my very rough notes from day 2 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Slow and still dirty Digital Humanities Australasia notes: day 3). In the interests of speed I'll share my notes and worry about my own interpretations later.

Keynote panel, 'Big Digital Humanities?'

Day 2 was introduced by Craig Bellamy, and began with a keynote panel with Peter Robinson, Harold Short and John Unsworth, chaired by Hugh Craig. [See also Snurb's liveblogs for Robinson, Short and Unsworth.] Robinson asked 'what constitutes success for the digital humanities?' and further, what does the visible successes of digital humanities mask? He said it's harder for scholars to do high quality research with digital methods now than it was 20 years ago. But the answer isn't more digital humanists, it's having the ingredients to allow anyone to build bridges… He called for a new generation of tools and methods to support the scholarship that people want to do: 'It should be as easy to make a digital edition (of a document/book) as it is to make a Facebook page', it shouldn't require collaboration with a digital humanist. To allow data made by one person to be made available to others, all digital scholarship should be made available under a Creative Commons licence (publishers can't publish it now if it's under a non-commercial licence), and digital humanities data should be structured and enriched with metadata and made available for re-use with other tools. The model for sustainability depends on anyone and everyone being able to access data.

Harold Short talked about big (or at least unescapable) data and the 'Svensson challenge' – rather than trying to work out how to take advantage of infrastructure created by and for the sciences, use your imagination to figure out what's needed for the arts and humanities. He called for a focus on infrastructure and content rather than 'data'.

John Unsworth reminded us that digital humanities is a certain kind of work in the humanities that uses computational methods as its research methods. It's not just using digital materials, though it does require large collections of data – it also requires a sense of how how the tools work.

What is the digital humanities?

Very different versions of 'digital humanities' emerged through the panel and subsequent discussion, leaving me wondering how they related to the different evolutionary paths of digital history and digital literature studies mentioned the day before. Meanwhile, on the back channel (from the tweets that are to hand), I wondered if a two-tier model of digital humanities was emerging – one that uses traditional methods with digital content (DH lite?); another that disrupts traditional methods and values. Though thinking about it now, the 'tsunami' of data mentioned is disruptive in its own right, regardless of the intentional choices one makes about research practices (which might have been what Alan Liu meant when he asked about 'seamless' and 'seamful' views of the world)…. On twitter, other people (@mikejonesmelb, @bestqualitycrab, @1n9r1d) wondered if the panel's interpretation of 'big' data was gendered, generational, sectoral, or any other combination of factors (including as the messiness and variability of historical data compared to literature) and whether it could have been about 'disciplinary breadth and inclusiveness' rather than scale.

Data morning session

The first speaker was Toby Burrows on 'Using Linked Data to Build Large‐Scale e‐Research Environments for the Humanities'. [Update: he's shared his slides and paper online and see also Snurb's liveblog.] Continuing some of the themes from the morning keynote panel, he said that the humanities has already been washed away in the digital deluge, the proliferation of digital stuff is beyond the capacity of individual researchers. It's difficult to answer complex humanities questions only using search with this 'industrialised' humanities data, but large-scale digital libraries and collections offer very little support for functions other than search. There's very little connection between data that researchers are amassing and what institutions are amassing.

He's also been looking at historians/humanists research practices [and selfishly I was glad to see many parallels with my own early findings]. The tools may be digital rather than paper and scissors, but historians are still annotating and excerpting as they always have. The 'sharing' part of their work has changed the most – it's easier to share, and they can share at an earlier stage if they choose to do that, but not a lot has changed at the personal level.

Burrows said applying applying linked data approach to manuscript research would go a long way to addressing the complexity of the field. For example, using global URIs for manuscripts and parts; separating names and concepts from descriptive information; and using linked data functions to relate scholarly activities (annotations, excerpts, representations etc) to manuscript descriptions, objects and publications. Linked data can provide a layer of entities that sits between research activities and descriptions/collections/publications, which avoids conflating the entities and the source material. Multiple naming schemes are necessary for describing entities and relationships – there's no single authoritative vocabulary. It's a permanent work in progress, with no definitive or final structure. Entities need to include individuals as well as categories, with a network graph showing relatedness and the evidence for that relatedness as the basic structure.

He suggested a focus on organising knowledge, not collections, whether objects or texts. Collaborative activities should be based around this knowledge, using tools that work with linked data entities. This raised the issue of contested ground and the application of labels and meaning to data: your 'discovery' is my 'invasion'. This makes citizen humanities problematic – who gets to describe, assign, link, and what does that mean for scholarly authority?

My notes aren't clear but I think Burrows said these ideas were based on analysis of medieval manuscript research, which Jane Hunter had also worked on, and they were looking towards the architecture for HuNI. It was encouraging to see an approach to linked data so grounded in the complexity of historians research practices and data, and is yet another reason I'm looking forward to following HuNI's progress – I think it will have valuable lessons for linked data projects in the rest of the world. [These slides from the Linked Open Data workshop in Melbourne a few weeks later show the academic workflow HuNI plans to support and some of the issues they'll have to tackle.]

The second speaker was the University of Sydney's Stephen Hayes on 'how linked is linked enough?'. [See also Snurb's liveblog.] He's looking at projects through a linked data lens, trying to assess how much further projects need to go to comfortably claim to be linked data. He talked about the issues projects encountered trying to get to be 5 star Linked Data.

He looked at projects like the Dictionary of Sydney, which expresses data as RDF as well in a public-facing HTML interface and comes close to winning 5 stars. It is a demonstration of the fact that once data is expressed in one form, it can be easily expressed in another form – stable entities can be recombined to form new structures. The project is powered by Heurist, a tool for managing a wide range of research data. The History of Balinese Painting could not find other institutions that exposed Balinese collection data in programmable form so they could link to them (presumably a common problem for early adopters but at least it helps solve the 'chicken or the egg' problem that dogs linked data in cultural heritage and the humanities). The sites URLs don't return useful metadata but they do try to refer to image URLs so it's 'sorta persistent'. He gave it a rating of 3.5 stars. Other projects mentioned (also built on Heurist?) were the Charles Harpur Critical Archive, rated at 3.5 stars and Virtual Zagora, rated at 3 stars.

The paper was an interesting discussion of the team work required to get the full 5 stars of linked data, and the trade-offs in developing functions for structured data (e.g. implementing schema.org's painting markup versus focussing on the quality of the human-facing pages); reassuring curators about how much data would be released and what would be kept back; developing ontologies throughout a project or in advance and the overhead in mapping other projects concepts to their own version of Dublin Core.

The final paper in the session was 'As Curious An Entity: Building Digital Resources from Context, Records and Data' by Michael Jones and Antonina Lewis (abstract). [See also Snurb's liveblog.] They said that improving the visibility of relationships between entities enriches archives, as does improving relationships between people. The title quote in full is 'as curious an entity as bullshit writ on silk' – if the parameters, variables and sources of data are removed from material, then it's just bullshit written on silk. Visualisations remove sources, complexity and 'relative context', and would be richer if they could express changes in data over time and space. They asked how one would know that information presented in a visualisation is accurate if it doesn't cite sources? You must seek and reference original material to support context layers.

They presented an overview of the Saulwick Archive project (Saulwick ran polls for the Fairfax newspapers for years) and the Australian Women's Register, discussed common issues faced in digital humanities, and the role of linked data and human relationships in building digital resources. They discussed the value of maintaining relationships between archives and donors after the transfer of material, and the need to establish data management plans to make provision for raw data and authoritative versions of related contextual material, and to retain data to make sense of the archives in the future. The Australian Women's Register includes content written for the site and links out to the archival repositories and libraries where the records are held. In a lovely phrase, they described records as the 'evidential heart' for the context and data layers. They also noted that the keynote overlooked non-academic re-use of digital resources, but it's another argument for making data available where possible.

Digital histories session

The first paper was 'Community Connections: The Renaissance of Local History' by Lisa Murray. Murray discussed the 'three Cs' needed for local history: connectivity, community, collaboration.

Is the process of geo-referencing forcing historians to be more specific about when or where things happened? Are people going from the thematic to the particular? Is it exciting for local historians to see how things fit into state or national narratives? Digital history has enormous potential for local and family history and to represent complicated relationships within a community and how they've changed over time. Digital history doesn't have to be article-centric – it enables new forms of presentation. Historians have to acknowledge that Wikipedia is aligned to historians' processes. Local history is strongly represented on Wikipedia. The Dictionary of Sydney provides a universal framework for accessing Sydney's history.

The democratisation of historical production is exciting but raises it challenges for public understandings of how history undertaken and represented. Are some histories privileged? Making History (a project by Museum Victoria and Monash University) encourages the use of online resources but does that privilege digitised sources, and will others be neglected? Are easily accessible sources privileged, and does that change what history is written? What about community collections or vast state archives that aren't digitised?

History research methodologies are changing – Google etc is shaping how research is undertaken; the ubiquity of keyword searching reinforces the primacy of names. She noted the impact of family historians on how archives prioritise work. It's not just about finding sources – to produce good history you need to analyse the sources. Professional historians are no longer the privileged producers of knowledge. History can be parochial, inclusive, but it can also lack sense of historical perspective, context. Digital history production amplifies tensions between popular history and academic history [and presumably between amateur and academic historians?].

Apparently primary school students study more local history than university students do. Local and community history is produced by broad spectrum of community but relatively few academic historians are participating. There's a risk of favouring quirky facts over significance and context. Unless history is more widely taught, local history will be tarred with same brush as antiquarians. History is not only about narrative and context… Historians need to embrace the renaissance of local and community history.

In the questions there was some discussion of the implications of Sydney's city archives being moved to a more inconvenient physical location. The justification is that it's available through Ancestry but that removes it from all context [and I guess raises all the issues of serendipity etc in digital vs physical access to archives].

The next speaker was Tim Sherratt on 'Inside the bureaucracy of White Australia'. His slides are online and his abstract is on the Invisible Australians site. The Invisible Australians project is trying to answer the question of what the White Australia policy looked like to a non-white Australian.  He talked about how digital technology can help explore the practice of exclusion as legislation and administrative processes were gradually elaborated. Chinese Australians who left Australia and wanted to return had to prove both their identity and their right to land to convince officials they could return: 'every non-white resident was potentially a prohibited immigrant just waiting to be exposed'. He used topic modelling on file titles from archival series and was able to see which documents related to the White Australia policy. This is a change from working through hierarchical structures of archives to working directly through the content of archives. This provides a better picture of what hasn't survived, what's missing and would have many other exciting uses. [His post on Topic modelling in the archives explains it better than my summary would.]

The final paper was Paul Turnbull on 'Pancake history'. He noted that in e-research there's a difference between what you can use in teaching and what makes people nervous in the research domain. He finds it ironic that professional advancement for historians is tied to writing about doing history rather than doing history. He talked about the need to engage with disciplinary colleagues who don't engage with digital humanities, and issues around historians taking digital history seriously.

Sherratt's talk inspired discussion of funding small-scale as well as large-scale infrastructure, possibly through crowdfunding. Turnbull also suggested 'seeding ideas and sharing small apps is the way to go'.

[Note from when I originally posted this: I don't know when my flight is going to be called, so I'll hit publish now and keep working until I board – there's lots more to fit in for day 2! In the afternoon I went to the 'Digital History' session. I'll tidy up when I'm in the UK as I think blogger is doing weird LTR things because it may be expecting Arabic.]

See also Slow and still dirty Digital Humanities Australasia notes: day 3.