From piles of material to patchwork: How do we embed the production of usable collections data into library work?

How do we embed the production of usable collections data into library work?These notes were prepared for a panel discussion at the ‘Always Already Computational: Collections as Data‘ (#AACdata) workshop, held in Santa Barbara in March 2017. While my latest thinking on the gap between the scale of collections and the quality of data about them is informed by my role in the Digital Scholarship team at the British Library, I’ve also drawn on work with catalogues and open cultural data at Melbourne Museum, the Museum of London, the Science Museum and various fellowships. My thanks to the organisers and the Institute of Museum and Library Services for the opportunity to attend. My position paper was called ‘From libraries as patchwork to datasets as assemblages?‘ but in hindsight, piles and patchwork of material seemed a better analogy.

The invitation to this panel asked us to share our experience and perspective on various themes. I’m focusing on the challenges in making collections available as data, based on years of working towards open cultural data from within various museums and libraries. I’ve condensed my thoughts about the challenges down into the question on the slide: How do we embed the production of usable collections data into library work?

It has to be usable, because if it’s not then why are we doing it? It has to be embedded because data in one-off projects gets isolated and stale. ‘Production’ is there because infrastructure and workflow is unsexy but necessary for access to the material that makes digital scholarship possible.

One of the biggest issues the British Library (BL) faces is scale. The BL’s collections are vast – maybe 200 million items – and extremely varied. My experience shows that publishing datasets (or sharing them with aggregators) exposes the shortcomings of past cataloguing practices, making the size of the backlog all too apparent.

Good collections data (or metadata, depending on how you look at it) is necessary to avoid the overwhelmed, jumble sale feeling of using a huge aggregator like Europeana, Trove, or the DPLA, where you feel there’s treasure within reach, if only you could find it. Publishing collections online often increases the number of enquiries about them – how can institution deal with enquiries at scale when they already have a cataloguing backlog? Computational methods like entity identification and extraction could complement the ‘gold standard’ cataloguing already in progress. If they’re made widely available, these other methods might help bridge the resourcing gaps that mean it’s easier to find items from richer institutions and countries than from poorer ones.

Photo of piles of materialYou probably already all know this, but it’s worth remembering: our collections aren’t even (yet) a patchwork of materials. The collections we hold, and the subset we can digitise and make available for re-use are only a tiny proportion of what once existed. Each piece was once part of something bigger, and what we have now has been shaped by cumulative practical and intellectual decisions made over decades or centuries. Digitisation projects range from tiny specialist databases to huge commercial genealogy deals, while some areas of the collections don’t yet have digital catalogue records. Some items can’t be digitised because they’re too big, small or fragile for scanning or photography; others can’t be shared because of copyright, data protection or cultural sensitivities. We need to be careful in how we label datasets so that the absences are evident.

(Here, ‘data’ may include various types of metadata, automatically generated OCR or handwritten text recognition transcripts, digital images, audio or video files, crowdsourced enhancements or any combination or these and more)

Image credit:

In addition to the incompleteness or fuzziness of catalogue data, when collections appear as data, it’s often as great big lumps of things. It’s hard for normal scholars to process (or just unzip) 4gb of data.

Currently, datasets are often created outside normal processes, and over time they become ‘stale’ as they’re not updated when source collections records change. And when they manage to unzip them, the records rely on internal references – name authorities for people, places, etc – that can only be seen as strings rather than things until extra work is undertaken.

The BL’s metadata team have experimented with ‘researcher format’ CSV exports around specific themes (eg an exhibition), and CSV is undoubtedly the most accessible format – but what we really need is the ability for people to create their own queries across catalogues, and create their own datasets from the results. (And by queries I don’t mean SPARQL but rather faceted browsing or structured search forms).

Image credit: screenshot from

Collections are huge (and resources relatively small) so we need to supplement manual cataloguing with other methods. Sometimes the work of crafting links from catalogues to external authorities and identifiers will be a machine job, with pieces sewn together at industrial speed via entity recognition tools that can pull categories out or text and images. Sometimes it’s operated by a technologist who runs records through OpenRefine to find links to name authorities or Wikidata records. Sometimes it’s a labour of scholarly love, with links painstakingly researched, hand-tacked together to make sure they fit before they’re finally recorded in a bespoke database.

This linking work often happens outside the institution, so how can we ingest and re-use it appropriately? And if we’re to take advantage of computational methods and external enhancements, then we need ways to signal which categories were applied by catalogues, which by software, by external groups, etc.

The workflow and interface adjustments required would be significant, but even more challenging would be the internal conversations and changes required before a consensus on the best way to combine the work of cataloguers and computers could emerge.

The trick is to move from a collection of pieces to pieces of a collection. Every collection item was created in and about places, and produced by and about people. They have creative, cultural, scientific and intellectual properties. There’s a web of connections from each item that should be represented when they appear in datasets. These connections help make datasets more usable, turning strings of text into references to things and concepts to aid discoverability and the application of computational methods by scholars. This enables structured search across datasets – potentially linking an oral history interview with a scientist in the BL sound archive, their scientific publications in journals, annotated transcriptions of their field notebooks from a crowdsourcing project, and published biography in the legal deposit library.

A lot of this work has been done as authority files like AAT, ULAN etc are applied in cataloguing, so our attention should turn to turning local references into URIs and making the most of that investment.

Applying identifiers is hard – it takes expert care to disambiguate personal names, places, concepts, even with all the hinting that context-aware systems might be able to provide as machine learning etc techniques get better. Catalogues can’t easily record possible attributions, and there’s understandable reluctance to publish an imperfect record, so progress on the backlog is slow. If we’re not to be held back by the need for records to be perfectly complete before they’re published, then we need to design systems capable of capturing the ambiguity, fuzziness and inherent messiness of historical collections and allowing qualified descriptors for possible links to people, places etc. Then we need to explain the difference to users, so that they don’t overly rely on our descriptions, making assumptions about the presence or absence of information when it’s not appropriate.

Image credit:

Photo of pipes over a buildingA lot of what we need relies on more responsive infrastructure for workflows and cataloguing systems. For example, the BL’s systems are designed around the ‘deliverable unit’ – the printed or bound volume, the archive box – because for centuries the reading room was where you accessed items. We now need infrastructure that makes items addressable at the manuscript, page and image level in order to make the most of the annotations and links created to shared identifiers.

(I’d love to see absorbent workflows, soaking up any related data or digital surrogates that pass through an organisation, no matter which system they reside in or originate from. We aren’t yet making the most of OCRd text, let alone enhanced data from other processes, to aid discoverability or produce datasets from collections.)

Image credit:
My final thought – we can start small and iterate, which is just as well, because we need to work on understanding what users of collections data need and how they want to use them. We’re making a start and there’s a lot of thoughtful work behind the scenes, but maybe a bit more investment is needed from research libraries to become as comfortable with data users as they are with the readers who pass through their physical doors.

All the things I didn’t say in my welcome to UKMW14 ‘Museums beyond the web’…

Here are all the things I (probably) didn’t say in my Chair’s welcome for the Museums Computer Group annual conference… Other notes, images and tweets from the day are linked from ‘UKMW14 round-up: posts, tweets, slides and images‘.

Welcome to MCG’s UKMW14: Museums beyond the web! We’ve got great speakers lined up, and we’ve built in lots of time to catch up and get to know your peers, so we hope you’ll enjoy the day.

It’s ten years since the MCG’s Museums on the Web became an annual event, and it’s 13 years since it was first run in 2001. It feels like a lot has changed since then, but, while the future is very definitely here, it’s also definitely not evenly distributed across the museum sector. It’s also an interesting moment for the conference, as ‘the web’ has broadened to include ‘digital’, which in turn spans giant distribution networks and tiny wearable devices. ‘The web’ has become a slightly out-dated shorthand term for ‘audience-facing technologies’.

When looking back over the last ten years of programmes, I found myself thinking about planetary orbits. Small planets closest to the sun whizz around quickly, while the big gas giants move incredibly slowly. If technology start-ups are like Mercury, completing a year in just 88 Earth days, and our audiences are firmly on Earth time, museum time might be a bit closer to Mars, taking two Earth years for each Mars year, or sometimes even Jupiter, completing a circuit once every twelve years or so.

But museums aren’t planets, so I can only push that metaphor so far. Different sections of a museum move at different speeds. While heroic front of house staff can observe changes in audience behaviours on a daily basis and social media platforms can be adopted overnight, websites might be redesigned every few years, but galleries are only updated every few decades (if you’re lucky). For a long time it felt like museums were using digital platforms to broadcast at audiences without really addressing the challenges of dialogue or collaborating with external experts.

But at this point, it seems that, finally, working on digital platforms like the web has pushed museums to change how they work. On a personal level, the need for specific technical skills hasn’t changed, but more content, education and design jobs work across platforms, are consciously ‘multi-channel’ and audience rather than platform-centred in their focus. Web teams seem to be settling into public engagement, education, marketing etc departments as the idea of a ‘digital’ department slowly becomes an oxymoron. Frameworks from software development are slowly permeating organisations that use to think in terms of print runs and physical gallery construction. Short rounds of agile development are replacing the ‘build and abandon after launch’ model, voices from a range of departments are replacing the disembodied expert voice, and catalogues are becoming publications that change over time.

While many of us here are comfortable with these webby methods, how will we manage the need to act as translators between digital and museums while understanding the impact of new technologies? And how can we help those who are struggling to keep up, particularly with the impact of the cuts?

Today is a chance to think about the technologies that will shape the museums of the future. What will audiences want from us? Where will they go looking for information and expertise, and how much of that information and expertise should be provided by museums? How can museums best provide access to their collections and knowledge over the next five, ten years?

We’re grateful to our sponsors, particularly as their support helps keep ticket prices affordable. Firstly I’d like to thank our venue sponsors, the Natural History Museum. Secondly, I’d like to thank Faversham & Moss for their sponsorship of this conference. Go chat to them and find out more about their work!

Sharing is caring keynote ‘Enriching cultural heritage collections through a Participatory Commons’

Enriching cultural heritage collections through a Participatory Commons platform: a provocation about collaborating with users

Mia Ridge, Open University Contact me: @mia_out or

[I was invited to Copenhagen to talk about my research on crowdsourcing in cultural heritage at the 3rd international Sharing is Caring seminar on April 1. I’m sharing my notes in advance to make life easier for those awesome people following along in a second or third language, particularly since I’m delivering my talk via video.]

Today I’d like to present both a proposal for something called the ‘Participatory Commons’, and a provocation (or conversation starter): there’s a paradox in our hopes for deeper audience engagement through crowdsourcing: projects that don’t grow with their participants will lose them as they develop new skills and interests and move on. This talk presents some options for dealing with this paradox and suggests a Participatory Commons provides a way to take a sector-wide view of active engagement with heritage content and redefine our sense of what it means when everybody wins.

I’d love to hear your thoughts about this – I’ll be following the hashtag during the session and my contact details are above.

Before diving in, I wanted to reflect on some lessons from my work in museums on public engagement and participation.

My philosophy for crowdsourcing in cultural heritage (aka what I’ve learnt from making crowdsourcing games)

One thing I learnt over the past years: museums can be intimidating places. When we ask for help with things like tagging or describing our collections, people want to help but they worry about getting it wrong and looking stupid or about harming the museum.

The best technology in the world won’t solve a single problem unless it’s empathically designed and accompanied by social solutions. This isn’t a talk about technology, it’s a talk about people – what they want, what they’re afraid of, how we can overcome all that to collaborate and work together.

Dora’s Lost Data

So a few years ago I explored the potential of crowdsourcing games to make helping a museum less scary and more fun. In this game, ‘Dora’s Lost Data‘, players meet a junior curator who asks them to tag objects so they’ll be findable in Google. Games aren’t the answer to everything, but identifying barriers to participation is always important. You have to understand your audiences – their motivations for starting and continuing to participate; the fears, anxieties, uncertainties that prevent them participating. [My games were hacked together outside of work hours, more information is available at My MSc dissertation: crowdsourcing games for museums; if you’d like to see properly polished metadata games check out Tiltfactor’s]

Mutual wins – everybody’s happy

My definition of crowdsourcing: cultural heritage crowdsourcing projects ask the public to undertake tasks that cannot be done automatically, in an environment where the activities, goals (or both) provide inherent rewards for participation, and where their participation contributes to a shared, significant goal or research area.

It helps to think of crowdsourcing in cultural heritage as a form of volunteering. Participation has to be rewarding for everyone involved. That sounds simple, but focusing on the audiences’ needs can be difficult when there are so many organisational needs competing for priority and limited resources for polishing the user experience. Further, as many projects discover, participant needs change over time…

What is a Participatory Commons and why would we want one?

First, I have to introduce you to some people. These are composite stories (personas) based on my research…

Two archival historians, Simone and Andre. Simone travels to archives in her semester breaks to stock up on research material, taking photos of most documents ‘in case they’re useful later’, transcribing key text from others. Andre is often at the next table, also looking for material for his research. The documents he collected for his last research project would be useful for Simone’s current book but they’ve never met and he has no way of sharing that part of his ‘personal research collection’ with her. Currently, each of these highly skilled researchers take their cumulative knowledge away with them at the end of the day, leaving no trace of their work in the archive itself. Next…

Two people from a nearby village, Martha and Bob. They joined their local history society when they retired and moved to the village. They’re helping find out what happened to children from the village school’s class of 1898 in the lead-up to and during World War I. They are using census returns and other online documents to add records to a database the society’s secretary set up in Excel. Meanwhile…

A family historian, Daniel. He has a classic ‘shoebox archive’ – a box containing his grandmother Sarah’s letters and diary, describing her travels and everyday life at the turn of the century. He’s transcribing them and wants to put them online to share with his extended family. One day he wants to make a map for his kids that shows all the places their great-grandmother lived and visited. Finally, there’s…

Crowdsourcer Nisha.She has two young kids and works for a local authority. She enjoys playing games like Candy Crush on her mobile, and after the kids have gone to bed she transcribes ship logs on the Old Weather website while watching TV with her husband. She finds it relaxing, feels good about contributing to science and enjoys the glimpses of life at sea. Sites like Old Weather use ‘microtasks’ – tiny, easily accomplished tasks – and crowdsourcing to digitise large amounts of text.

Helping each other?

None of our friends above know it, but they’re all looking at material from roughly the same time and place. Andre and Simone could help each other by sharing the documents they’ve collected over the years. Sarah’s diaries include the names of many children from her village that would help Martha and Bob’s project, and Nisha could help everyone if she transcribed sections of Sarah’s diary.

Connecting everyone’s efforts for the greater good: Participatory Commons

This image shows the two main aspects of the Participatory Commons: the different sources for content, and the activities that people can do with that content.

The Participatory Commons (image: Mia Ridge)

The Participatory Commons is a platform where content from different sources can be aggregated. Access to shared resources underlies the idea of the ‘Commons’, particularly material that is not currently suitable for sites like Europeana, like ‘shoebox archives’ and historians’ personal record collections. So if the ‘Commons’ part refers to shared resources, how is it participatory?

The Participatory Commons interface supports a range of activities, from the types of tasks historians typically do, like assessing and contextualising documents, activities that specialists or the public can do like identifying particular people, places, events or things in sources, or typical crowdsourcing tasks like fulltext transcription or structured tagging.

By combining the energy of crowdsourcing with the knowledge historians create on a platform that can store or link to primary sources from museums, libraries and archives with ‘shoebox archives’, the Commons could help make our shared heritage more accessible to all. As a platform that makes material about ordinary people available alongside official archives and as an interface for enjoyable, meaningful participation in heritage work, the Commons could be a basis for ‘open source history’, redressing some of the absences in official archives while improving the quality of all records.

As a work in progress, this idea of the Participatory Heritage Commons has two roles: an academic thought experiment to frame my research, and as a provocation for GLAMs (galleries, museums, libraries, archives) to think outside their individual walls. As a vision for ‘open source history’, it’s inspired by community archives, public history, participant digitisation and history from below… This combination of a large underlying repository and more intimate interfaces could be quite powerful. Capturing some of the knowledge generated when scholars access collections would benefit both archives and other researchers.

‘Niche projects’ can be built on a Participatory Commons

As a platform for crowdsourcing, the Participatory Commons provides efficiencies of scale in the backend work for verifying and validating contributions, managing user accounts, forums, etc. But that doesn’t mean that each user would experience the same front-end interface.

Niche projects build on the Participatory Commons
(quick and dirty image: Mia Ridge)

My research so far suggests that tightly-focused projects are better able to motivate participants and create a sense of community. These ‘niche’ projects may be related to a particular location, period or topic, or to a particular type of material. The success of the New York Public Library’s What’s on the Menu project, designed around a collection of historic menus, and the British Library’s GeoReferencer project, designed around their historic map collection, both demonstrate the value of defining projects around niche topics.

The best crowdsourcing projects use carefully designed interactions tailored to the specific content, audience and data requirements of a given project. These interactions are usually For example, the Zooniverse body of projects use much of the same underlying software but projects are designed around specific tasks on specific types of material, whether classifying simple galaxy types, plankton or animals on the Serengeti, or transcribing ship logs or military diaries.

The Participatory Commons is not only a collection of content, it also allows ‘niche’ projects to be layered on top, presenting more focused sets of content, and specialist interfaces designed around the content, audience and purpose.


But there are still many barriers to consider, including copyright and technical issues and important cultural issues around authority, reliability, trust, academic credit and authorship. [There’s more background on this at my earlier post on historians and the Participatory Commons and Early PhD findings: Exploring historians’ resistance to crowdsourced resources.]

Now I want to set the idea of the Participatory Commons aside for a moment, and return to crowdsourcing in cultural heritage. I’ve been looking for factors in the success or otherwise of crowdsourcing projects, from grassroots, community-lead projects to big glamorous institutionally-lead sites.

I mentioned that Nisha found transcribing text relaxing. Like many people who start transcribing text, she found herself getting interested in the events, people and places mentioned in the text. Forums or other methods for participants to discuss their questions seem to help keep participants motivated, and they also provide somewhere for a spark of curiosity to grow (as in this forum post). We know that some people on crowdsourcing projects like Old Weather get interested in history, and even start their own research projects.

Crowdsourcing as gateway to further activity

You can see that happening on other crowdsourcing projects too. For example, Herbaria@Homeaims to document historical herbarium collections within museums based on photographs of specimen cards. So far participants have documented over 130,000 historic specimens. In the process, some participants also found themselves being interested in the people whose specimens they were documenting.

As a result, the project has expanded to include biographies of the original specimen collectors. It was able to accommodate this new interest through a project wiki, which has a combination of free text and structured data linking records between the transcribed specimen cards and individual biographies.

‘Levels of Engagement’ in citizen science

There’s a consistent enough pattern in science crowdsourcing projects that there’s a model from ‘citizen science’ that outlines different stages participants can move through, from undertaking simple tasks, joining in community discussion, through to ‘working independently on self-identified research projects’.[1]

Is this ‘mission accomplished’?

This is Nick Poole’s word cloud based on 40 museum missionstatements. With words like ‘enjoyment’, ‘access’, ‘learning’ appearing in museum missions, doesn’t this mean that turning transcribers into citizen historians while digitising and enhancing collections is a success? Well, yes, but…

Paths diverge; paradox ahead?

There’s a tension between GLAM’s desire to invite people to ‘go deeper’, to find their own research interests, to begin to become citizen historians; and the desire to ask people to help us with tasks set by GLAMs to help their work. Heritage organisations can try to channel that impulse to start research into questions about their own collections, but sometimes it feels like we’re asking people to do our homework for us. The scaffolds put in place to help make tasks easier may start to feel like a constraint.

Who has agency?

If people move beyond simple tasks into more complex tasks that require a greater investment of time and learning, then issues of agency – participants’ ability to make choices about what they’re working on and why – start to become more important. Would Wikipedia have succeeded if it dictated what contributors had to write about? We shouldn’t mistake volunteers for a workforce just because they can be impressively dedicated contributors.

Participatory project models

Turning again to citizen science – this time public participation in science research, we have a model for participatory projects according to the amount of control participants have over the design of the project itself – or to look at it another way, how much authority the organisation has ceded to the crowd. This model contains three categories: ‘contributory’, where the public contributes data to a project designed by the organisation; ‘collaborative’, where the public can help refine project design and analyse data in a project lead by the organisation; and ‘co-creative’, where the public can take part in all or nearly all processes, and all parties design the project together.[2]

As you can imagine, truly co-creative projects are rare. It seems cultural organisations find it hard to truly collaborate with members of the public; for many understandable reasons. The level of transparency required, and the investment of time for negotiating mutual interests, goals and capabilities increase as collaboration deepens. Institutional constraints and lack of time to engage in deep dialogue with participants make it difficult to find shared goals that work for all parties. It seems GLAMs sometimes try to take shortcuts and end up making decisions for the group, which means their ‘co-creative’ project is actually more just ‘collaborative’.

New challenges

When participants start to out-grow the tasks that originally got them hooked, projects face a choice. Some projects are experimenting with setting challenges for participants. Here you see ‘mysteries’ set by the UK’s Museum of Design in Plastics, and by San FranciscoPublic Library on History Pin. Finding the right match between the challenge set and the object can be difficult without some existing knowledge of the collection, and it can require a lot of on-going time to encourage participants. Putting the mystery under the nose of the person who has the knowledge or skills to solve it is another challenge that projects like this will have to tackle.

Working with existing communities of interest is a good start, but it also takes work to figure out where they hang out online (or in-person) and understand how they prefer to work. GLAMs sometimes fall into the trap of choosing the technology first, or trying something because it’s trendy; it’s better to start with the intersection between your content and the preferences of potential audiences.

But is it wishful thinking to hope that others will be interested in answering the questions GLAMs are asking?

A tension?

Should projects accept that some people will move on as they develop new interests, and concentrate on recruiting new participants to replace them? Do they try to find more interesting tasks or new responsibilities for participants, such as helping moderate discussions, or checking and validating other people’s work? Or should they find ways for the project grow as participants’ skill and knowledge increase? It’s important to make these decisions mindfully as the default is otherwise to accept a level of turnover as participants move on.

To return to lessons from citizen science, possible areas for deeper involvement include choosing or defining questions for study, analysing or interpreting data and drawing conclusions, discussing results and asking new questions.[3]However, heritage organisations might have to accept that the questions people want to ask might not involve their collections, and that these citizen historians’ new interests might not leave time for their previous crowdsourcing tasks.

Why is a critical mass of content in a Participatory Commons useful?

And now we return to the Participatory Commons and the question of why a critical mass of content would be useful.

Increasingly, the old divisions between museum, library and archive collections don’t make sense. For most people, content is content, and they don’t understand why a pamphlet about a village fete in 1898 would be described and accessed differently depending on whether it had ended up in a museum, library or archive catalogue.

Basing niche projects on a wider range of content creates opportunities for different types of tasks and levels of responsibility. Projects that provide a variety of tasks and roles can support a range of different levels and types of participant skills, availability, knowledge and experience.

A critical mass of material is also important for the discoverability of heritage content. Even the most sophisticated researcher turns to Google sometimes, and if your content doesn’t come up in the first few results, many researchers will never know it exists. It’s easy to say but less easy to make a reality: the easier it is to find your collections, the more likely it is that researchers will use them.

Commons as party?

More importantly, a critical mass of content in a Commons allows us to re-define ‘winning’. If participation is narrowly defined as belonging to individual GLAMs, when a citizen historian moves onto a project that doesn’t involve your collection then it can seem like you’ve lost a collaborator. But the people who developed a new research interest through a project at one museum might find they end up using records from the archive down the road, and transcribing or enhancing their records during their investigation. If all the institutions in the region shared their records on the Commons or let researchers take and share photos while using their collections, the researcher has a critical mass of content for their research and hopefully as a side-effect, their activities will improve links between collections. If the Commons allows GLAMs to take a sector-wide view then someone moving on to a different collection becomes a moment to celebrate, a form of graduation. In our wildest imagination, the Commons could be like a fabulous party where you never know what fabulous interesting people and things you’ll discover…

To conclude – by designing platforms that allow people to collect and improve records as they work, we’re helping everybody win.

Thank you! I’m looking forward to hearing your thoughts.

[1]M. Jordan Raddick et al., ‘Citizen Science: Status and Research Directions for the Coming Decade’, in astro2010: The Astronomy and Astrophysics Decadal Survey, vol. 2010, 2009,

[2]Rick Bonney et al., Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report (Washington D.C.: Center for Advancement of Informal Science Education (CAISE), July 2009),

[3]Bonney et al., Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report.

Image credits in order of appearance: Glider, Library of Congress, Great hall, Library of CongressCurzona Allport from Tasmanian Archive and Heritage Office, Hålanda Church, Västergötland, Sweden, Swedish National Heritage Board, Smithsonian Institution, Postmaster, General James A. Farley During National Air Mail Week, 1938Powerhouse Museum, Canterbury Bankstown Rugby League Football Club’s third annual Ball.

‘Bringing maker culture to cultural organisations’ at VALA2014

I’ve just spent a week in Melbourne (my home town, awww) for VALA2014. VALA is about ‘libraries, technology and the future’ and the conference theme for 2014 was ‘streaming with possibilities’. Kim Tairi‘s briefing (as Chair of the VALA2014 Conference Programme Committee) included the phrases ‘stories that will ignite, challenge and excite our audience’ and ‘don’t be afraid to be controversial or push the boundaries’, which was a brilliant challenge and turned out to be a great introduction to the ethos of the conference.

Image by Con Wiebrands 萬事如意 @flexnib

My keynote was on ‘Bringing maker culture to cultural organisations’. From my abstract: Should museums, libraries and archives be places for looking at old stuff other people have made, or could they also be places where new creations are inspired and made? If making – writing, designing, building – is the deepest level of engagement with heritage and culture, how can memory institutions avoid the comforting but deadly trap of broadcasting at the public and instead create spaces for curating, creating or conversing with them? Somehow that meant a romp through banana pianos, the link between knitting and historic newspapers, why I like coding, the value of tinkering, secret shoppers and the fact that everyone is a maker of some sort (or was in the past).

Update: videos of the keynotes are now available online! I haven’t watched any cos I don’t have the Silverlight. I’d recommend them all, but I’m particularly looking forward to re-watching Gene Tan and Matt Finch‘s keynotes.

I’m sharing my slides below, but Slideshare seems to have stopped including the speaker notes so they’re best viewed in conjunction with either of the two blog posts about my keynote that appeared with impressive speed or the tweets from my session. I’ve storified the tweets at Tweets from keynote ‘Bringing maker culture to cultural organisations’ at VALA14 – the audience did a fantastic job of summarising my speech, adding their own questions and comments, and sharing links to the sites and projects I mentioned. Yay, librarians! The two posts are Deborah ‘@deborahfitchett‘ Fitchett’s Bringing maker culture to cultural organisations and Richard ‘@penanghill‘ Hayward’s Mia Ridge on the Maker Movement (on an unrelated-but-home town note, Richard was my boss many, many years ago!).

Bringing maker culture to cultural organisations from Mia

Huge thanks to the organisers for the invitation to speak, to the conference staff for making everything run so smoothly, to the other keynotes for their inspiration and to the attendees for being such good sports.

‘An (even briefer) history of open cultural data’ at GLAM-Wiki 2013

These are some of my notes for my invited plenary talk at GLAM-Wiki 2013 (Galleries, Libraries, Archives, Museums & Wikimedia, #GLAMWiki), held at the British Library on April 12-13, 2013. I don’t think I stuck that closely to them on the day, and in the interests of brevity I’ve left out the ‘timeline’ bits (but you can read about some of them in a related MuseumID article, ‘Where next for open cultural data in museums?‘) to focus on the lessons to be learnt from changes so far. There were lots of great talks and discussion at the event, you can view some of the presentations on Wikimedia UK’s YouTube channel.

A (now very) brief history of open cultural data

Firstly, thank you for the invitation to speak… This morning I want to highlight some key moments of change in the history of open cultural data – a history not only of licenses and data, but also of conversations, standards, and collaborations, of moments where things changed… I’ve included key moments from funders, legislative influences and the commercial sector too, as they create the context in which change happens and often have an effect on what’s considered possible. I’ll close by considering some of the lessons learnt.

[Please help improve this talk]

A caveat – there may well be a bias towards the English-speaking world (and to museums, because of my background). If you know of an open GLAM (gallery, library, archive, museum) data source I’ve missed, you can add it to the open cultural data/GLAM API wiki… or Lotte’s Belice‘s list of open culture milestones  timeline.


‘open cultural data’ is data from cultural institutions that is made available for use in a machine-readable format under an open licence. But each word in open, cultural, data is slightly more complicated so I’ll unpack them a little…


Office clerks, FNV. Voorlichting.

While the degree of openness required to be ‘open’ data can be contentious, at its simplest, ‘open’ refers to content that is available for use outside the institution that created it, whether for school homework projects, academic monographs or mobile phone apps. ‘Open’ may refer to licences that clarify the permissions and restrictions placed on data, or to the use of non-proprietary digital technologies, or ideally, to a combination of both open licences and technologies.

Ideally, open data is freely available for use and redistribution by anyone for any purpose, but in reality there are often restrictions. GLAMs may limit commercial use by licensing content for ‘non-commercial use only’, but as there is no clear definition of ‘non-commercial use’ in Creative Commons licences, some developers may choose not to risk using a dataset with an unclear licence. GLAMs may also release data for commercial use but still require attribution, either to help retain the provenance of the content, to help people find their way to related content or just because they’d like some credit for their work. GLAMs might also release data under custom licences that deal with their specific circumstances, but they are then difficult to integrate with content from other openly-licensed datasets.

Hybrid licensing models are a pragmatic solution for the current environment. They at least allow some use and may contribute to greater use of open cultural data while other issues are being worked out. For example, some institutions in the UK are making lower resolutions images available for re-use under an open licence while reserving high resolution versions for commercial sales and licensing. Or they may differentiate between scholarly and commercial use, or use more restrictive licences for commercially valuable images and release everything else openly.

I think this type of access is better than nothing, particularly if organisations can learn from the experience and release more data next time. Because these hybrid models are often experimental, their reception is important, and it’s helpful for GLAMs to be able to show they’ve had a positive impact and hopefully helped create relationships with groups like Wikipedia.


Cultural data is data about objects, publications (such as books, pamphlets, posters or musical scores), archival material, etc, created and distributed by museums, libraries, archives and other organisations.


It’s a useful distinction to discuss early with other cultural heritage staff as it’s easy to be talking at cross-purposes: data can refer to different types of content, from metadata or tombstone records (the basic titles, names, dates, places, materials, etc of a catalogue record), to entire collection records (including data such as researched and interpretive descriptions of objects, bibliographic data, related themes and narratives) to full digital surrogates of an object, document or book as images or transcribed text. Some organisations release open metadata, others release all their data including their images. If you can’t do open data (full content or ‘digital surrogates’ like photographs or texts) then at least open up the metadata (data about the content) as e.g. CC0 and the rest with another licence. Releasing data may involve licensing images, offering downloads from catalogue sites; ‘content donations’, APIs and machine-facing interfaces; term lists, etc. Much of the data that isn’t images isn’t immediately interesting, and may be designed for inter-collections interoperability or mashups rather than media commons.

Why is open cultural data important?

Before I go on, why do we care? Open cultural data is the foundation on which many projects can be built. It helps achieve organisational goals, mission; can help increase engagement with content; can create ‘network effect’ with related institutions; can be re-used by people who share your goals around access to knowledge and information – people like Wikipedians.

Some key moments in open cultural data

Events I discussed included the founding of Wikimedia, Europeana and Flickr Commons, previous GLAM-Wiki conferences, changes in licences for art images, library catalogue records and museum content, GLAM APIs and linked data services and the launch of the Digital Public Library of America next week.

Lessons learnt

Many of the changes are the results of years of conversation and collaboration – change is slow but it does happen. GLAMs work through slow iterations – try something, and if no-one dies, they’ll try something else. We are all ambassadors, and we are all translators, helping each domain understand the other.

Contradictory things GLAMs are told they must do

  • Give content away for the benefit of all
  • Monetise assets; protect against loss of potential income; protect against mis-use of collections; conserve collections in perpetuity; protect the IP of artists; demonstrate ROI on digitisation

It’s not easy for GLAMs to release all their data under an entirely open licence, but they don’t do it just to be annoying – it’s important to understand some of the pressures they’re under.  For example, GLAMs usually need to be able to track uses of their data and content to show the impact of digitising and publishing content, so they prefer attribution licences.

The issue of potential lost income – imaginary money that could be made one day if circumstances change, or profit that someone else makes off their opened data – is particularly difficult as hard to deal with [and here I ad-libbed, saying that it was like worrying about failing to meet the love of your life because you got on a different tube carriage – you can’t live your life chasing ghosts]. Ideally, open data needs to be understood as an input to the creative economy rather than an item on the balance sheet of an individual GLAM.

GLAMs worry about reputational damage, whether appearing on the front page of a tabloid newspaper for the ‘wrong’ reasons, questions being asked in Parliament, or critique from Wikipedians.  Over time, their mindset is changing from keeping ‘our data’ to being holders, custodians of our shared heritage.

Conversations, communities, collaborations

Conversations matter… we’re all working towards the same goal, but we have different types of anxieties and different problems we have to address.

GLAMs are about collections, knowledge, and audiences. Unlike most online work, they are used to seeing the excitement people experience walking through their door – help GLAMs understand what Wikipedians can do for different audiences by making those audience real to them. GLAMs are also used to being wined and dined before you lay the hard word on them. Just because you don’t need to ask for permission to use content doesn’t mean you shouldn’t start a conversation with an organisation. There are lots of people with similar goals inside organisations, so try to find them and work with them. Trust is a currency, don’t blow it!

Being truly collaborative sometimes means compromising (or picking your battles) and it definitely means practising empathy. Open data people could stop talking about open data as something you *do* to GLAMs, and GLAMs could stop thinking open data people just want to make your life difficult.

The role of higher powers

Government attitudes to open data make a big difference and they can also change the risks associated with publishing orphan works.  Governments can also help GLAMs open up their content by indemnifying them against the chance that someone else will monetise their data – consider it not a failure of the GLAM but a contribution to the creative and digital economy.

Things that are better than a poke in the eye with a sharp stick

  1. Kittens (and puppies)
  2. Cultural data that’s available online but isn’t (yet) openly licensed
  3. Cultural data online that is licensed for non-commercial use

Yes, the last two aren’t ideal, but they are great deal better than nothing.

Into the future…

GLAMs and Wikipedians may move at different paces, and may have different priorities and different ways of viewing the world, but we’re all working towards the same goals. Not everything is as open, but a lot more is open than it used to be. I sensed yesterday [the first day of the conference] that there are still some tensions between Wikimedians and GLAMers, moments when we need to take a deep breath and put empathy before a pithy put down, but I loved that Kat Walsh’s welcome yesterday described how Wikipedia used to focus on how different from others but now focuses on reaching out to others and figuring out how we’re the same.

GLAMs and Wikipedians have already used open cultural data to make the world a better place. Let’s celebrate the progress we’ve made and keep working on that…

GLAM-WIKI 2013 Friday attendees photograph by Mike Peel (

Congratulations to everyone who helped make it a great event, but particularly to Daria Cybulska and Andrew Gray (@generalising) for making everything work so smoothly, and Liam Wyatt (@wittylama) for the original invitation to speak.

New challenges in digital history: sharing women’s history on Wikipedia – my talk notes

I’m at The Albert M. Greenfield Digital Center for the History of Women’s Education at Bryn Mawr College for the inaugural Women’s History in the Digital World Conference. Since I’m about to speak and ask historians to share their research and write history in public, I thought I should also be brave and share my draft talk notes (which I’ve now updated with formatted references, though Blogger is still re-formatting things slightly oddly).

Introduction: New challenges in digital history: sharing women’s history on Wikipedia

[slide – title, my details]
Hi, I’m Mia. I’m actually doing a PhD on scholarly crowdsourcing, or collaboratively creating online resources, and, thinking about the impact of digitality on the practices of historians, so this paper is indirectly related to my research but isn’t core to it.
I proposed this paper as a deliberate provocation: ‘if we believe the subjects of our research are important, then we should ensure they are represented on freely available encyclopedic sites like Wikipedia’. Just in case you’re not familiar with it, Wikipedia is a free online encyclopedia ‘that anyone can edit.’ It contains 25 million articles, over 4 million of them in English, but also in 285 other languages, and has 100,000 active contributors[1].

‘Brilliant Women’ at the National Portrait Gallery

The genesis of this paper was two-fold. The 2008 exhibition ‘Brilliant Women: 18th Century Bluestockings‘ at the UK National Portrait Gallery, made the point that ‘Despite the fact that ‘bluestockings’ made a substantial contribution to the creation and definition of national culture their intellectual participation and artistic interventions have largely been forgotten’. As a computer programmer, reinventing the wheel and other inefficient processes drive me crazy, and I began to think about how digital publishing could intervene in the cycle of remembering and forgetting that seemed to be the fate of brilliant women throughout history. How could historians use digital platforms to stop those histories being lost and to make them easy for others to find?

[Screenshot – Caitlin Moran quote from How to be a woman: ‘Even the ardent feminist historian, male or female – citing Amazons and tribal matriarchies and Cleopatra – can’t conceal that women have done basically f*ck-all for the last 100,000 years’]
A few years later, by then a brand-new PhD student, I attended the Women’s History Network conference in London in 2011 and learnt of so many interesting lives that challenged conventional mainstream historical narratives of gender. I wished that others could hear those stories too. But when I asked if any of these histories were available outside academia on sites like Wikipedia, there was a strong sense that editing Wikipedia was something that other people did. But who better to make a case for better representation of women’s histories than the people in that room? Who else has the skills, knowledge and the passion? Some academic battles may have been won regarding the importance of women’s histories, but representing women’s histories on the sites where ordinary people start their queries is hugely important. The quote on this slide illustrates why – even if it was meant in jest, it represents a certain world view.

WikiWomen’s Collaborative

[slide – logos from ]
Of course, I’m not the first, and definitely not the most qualified to make this point. I would also like to acknowledge the work of many groups and individuals, particularly within Wikipedia, that’s preceded this.[2]

[slide – Scripps editathon, #tooFEW]
Things move fast in the digital world and we’re at a different moment than the one when I proposed this paper. Gender issues on Wikipedia had been discussed for a number of years but there’s been a recent burst of activity, including the #tooFEW (‘Feminists Engage Wikipedia’) editathons – ‘a scheduled time where people edit Wikipedia together, whether offline, online, or a mix of both’ – [3], held online and in person across four physical sites.[4] [5] I was going to be provocative and ask you to create Wikipedia entries about the histories you’ve invested so much in researching, but some of that is happening already. As a result, this is version 2 of this paper, but my starting question remains the same – assuming we believe that women’s history is important, what’s wrong with our current methods of research dissemination and dialogue?

The case of the Invisible Scholarship

[slide – outline of section]
Cumulative centuries of archival and theoretical work have been spent recovering women’s histories, yet much of this inspiring scholarship might as well not exist when so few people have access to it. Sadly, it’s currently the case that scholarship that isn’t deliberately made public is invisible outside academia. The open access movement, with all its thorny complications, is one potential solution. Engaging in new forms of open scholarship and disseminating research on sites where the public already goes to learn about history is another.

If it’s not Googleable, it doesn’t exist.

[slide – screenshot of unsuccessful search for Ina von Grumbkow]
Most content searches start and end online. The content and links available to search engines inform their assumptions about the world, and they in turn shape the world view presented on the results screen. If the name of a historical figure doesn’t show up in Google, how else would someone find out about them? While college students might be heavy users of Google’s specialist Google Scholar search, it’s unlikely that people would come across it accidentally, not least because there’s a ‘semantic gap’ between the language used in academia and the language used in everyday speech. Writing for Wikipedia means writing in everyday language, and the site is heavily indexed by search engines – it doesn’t take long for content created on Wikipedia – even on a user’s talk page and not the main site – to show up in Google results. So one reason to take history on Wikipedia seriously is that it affects what search engines know about the world.

‘Did you mean… hegemony?’

Search for ‘Viscountess Ranelagh’, Google says ‘Did you mean Viscount’. No. 

[slide – screenshot  of search for ‘Viscountess Ranelagh and the Authorisation of Women’s Knowledge in the Hartlib Circle’, Google says ‘Did you mean Viscount’. No.]
Scholarship and sources contained in specialist online archives and repositories are often off-limits to the Google bots that crawl the web looking for content to index. Because search engines normalise certain assumptions about the world, getting more content about women’s histories in publically accessible spaces will eventually have an effect in the algorithms that determine suggestions for ‘did you mean’ etc. Contributions to sites like Wikipedia can eventually become contributions to the ‘knowledge graphs’ that determine the answers to questions we ask online.

If it’s behind a paywall, it only exists for a privileged few

[Slide – Screenshot of blocked attempt to access ‘Wives and daughters of early Berlin geoscientists and their work behind the scenes’]
Specialist users will be able to find academic research via Google Scholar, but any independent scholars in attendance will be able to speak to the difficulties in gaining access to journal articles without membership of an institutional library. Journal articles obviously have a lot of value within academic communities, but the research they represent is only available to a privileged few.

Why does Wikipedia matter?

[slide: For some, Wikipedia is the font of all wisdom]
Wikipedia is one of the most visited websites in the world. As one commentator said, ‘people turn to Wikipedia as an objective resource’ but ‘ it’s not so objective in many ways.'[6]

However, as the free online encyclopedia ‘that anyone can edit’, it also provides the ability to take direct action to fix the under-representation of women’s history. President of the AHA, William Cronon said, ‘Wikipedia provides an online home for people interested in histories long marginalized by the traditional academy'[7] – this may not be entirely true yet, but we can hope.

Wikipedia is not yet encyclopedic

[Slide – Ina screenshot]
The English version of Wikipedia has over 4 million articles but it still has some way to go to become truly encyclopedic. Martha Saxton has noted the absence of women’s history content on Wikipedia and was distressed by ‘its superficiality and inaccuracies when present [8]’. Just as female assistants, secretaries, collectors, illustrators, correspondents, translators, salonists, cataloguers, text book writers, popularisers, explorers, pioneers and colleagues have been left out of traditional academic histories and gradually reclaimed by historians, they are often still invisible on Wikipedia. This may be partly because not enough women edit Wikipedia – as Wikipedia User Gobonobo says, ‘editors often contribute to topics they are familiar with and that concern them […] This systemic bias has the potential to exacerbate an historical record that already gives undue emphasis to men.’ [9]

The under-representation of women’s history undermines Wikipedia’s claim to be encyclopedic. Issues include missing entries or omissions in coverage for existing topics, entries with inaccurate content, a failure to represent a truly ‘neutral point of view’, and a representation of ‘male’ as the default gender.

Many notable women have been buried in pages titled for their husbands, brothers, tutors, etc. In 1908 Ina von Grumbkow undertook an expedition to Iceland. She later made significant contributions to the field of natural history and wrote several books but other than passing references online and a mention on her husband’s Wikipedia page, her story is only available to those with access to sources like the ‘ Earth Sciences History’ journal[10][11].

[Slide: ‘Main articles: List of Fellows of the Royal Society and List of female Fellows of the Royal Society ‘.]
Some of the categories used in Wikipedia posit the default gender as male. For example, there’s a ‘ List of Fellows of the Royal Society  and ‘ List of female Fellows of the Royal Society.

Wikipedia and the challenges of digital history

Writing for Wikipedia encapsulates many, but not all, of the challenges of digital history.

New forms of writing

Writing for Wikipedia calls upon historians to write engaging, intellectually accessible, succinct text that still accurately represents its subject. It not only means valuing the work and skills in writing public history, it requires the ability to write history in public.

Writing for a ‘neutral point of view’ – one of the key values of Wikipedia – is challenging for historians. Many may find difficult to believe that it’s even possible, and it’s difficult to achieve [12].

Unlike traditional historical scholarship, characterised by ‘possessive individualism’ [13] and honed to perfection before publication, Wikipedia entries are considered a work in progress, and anyone who spots an issue is asked to fix it themselves or flag it for others to review.

It won’t advance your career

While it might have a large public impact, editing Wikipedia is work that isn’t credited in academia, and it takes time that could be used for projects that would count for career advancement. More importantly from Wikipedia’s point of view, you can’t promote your own work on the site, so writing about your own research interests is not straightforward if not many people have published in your area of expertise.

“On the internet, nobody knows you’re a professor”

In a comment with ‘pointers for academics who would like to contribute to Wikipedia’ on a Chronicle article, commentator ‘operalala’ said, ‘”On the internet nobody knows you’re a professor.” If you’re used to deferential treatment at your home institution, you’ll be treated like everybody else in the Wide Open Internet.'[14] Or in William Cronon’s words, you must ‘give up the comfort of credentialed expertise’.[15] Anyone can edit, re-shape or even delete your work.

Just like academia, Wikipedia has ways of establishing the credibility and reputation of a contributor, and just like any other community, there are etiquettes and conventions to observe. As newcomers to the community, Claire Potter warns that it’s important not to think of Wikipedia as ‘another realm for intellectuals to colonize and professionalize’.[16]

The opportunities and challenges of women’s history as public history on Wikipedia


#WomenSciWP editathon at the Royal Society

Wikipedia uses red links to represent entries that could be created but don’t yet exist. Women’s history editathons often create lists of red-linked names as suggested topics that could be created [17] . Projects on and outside Wikipedia, and events at institutions like the Smithsonian and Royal Society and just last weekend at three THATCamps across the United States might be part of a critical mass of people learning how to edit Wikipedia to better include women’s history.

Compared to the lengthy process of writing for academic publication, a new Wikipedia entry can be created in a few hours, allowing for time to structure the content and format the references as necessary to pass the first quality bar. An existing entry can be corrected in minutes. Each editathon or personal edit improves the representation of women’s history, and there’s something very satisfying about turning red links blue.

Ina von Grumbkow’s name red-linked on her husband’s Wikipedia page

Adding the brackets that turn a piece of text into a red link, suggesting the possibility of an entry to be created is a small but potentially powerful intervention. Red links can render the gaps and silences visible.


Creating or editing entries on women’s history may be relatively easy, but making sure they stay there is less so. There are countless examples of women having to fight to keep changes in as other editors revert them, argue about their choice of sources, the significance or notability of their topic. Wikipedians are zealous in preventing spammers and crackpots polluting the quality of the site, which explains some of the rapid ‘nominations for deletion’, but some pockets of the site are also hostile to women’s history or to women themselves.

Saxton said editing Wikipedia is ‘not for the faint of heart’ and ‘a lesson in how little women’s history has penetrated mainstream culture’. There’s work to be done in sharing and normalising an understanding of the historical circumstances and cultural contexts that created difficulties for women. We might know that, as Janet Abbate said, ‘The laws and social conventions of a given time and place strongly shape the kinds of technical training available to women and men, the career options open to them, their opportunities for advancement and recognition’ [18] but until other Wikipedians understand that, there will continue to be issues around ‘notability’. Having those conversations as many times as necessary might be tiring and uncomfortable or even controversial, but it’s part of the work of representing women’s history on Wikipedia.


‘Reliable sources’

Wikipedians may have different definitions of ‘reliable sources’ than scholarly researchers. As one academic discovered:
“Wikipedia is not ‘truth,’ Wikipedia is ‘verifiability’ of reliable sources. Hence, if most secondary sources which are taken as reliable happen to repeat a flawed account or description of something, Wikipedia will echo that.”‘ [19]

The same gatekeepers matter

As some academics have found, ‘Wikipedia differs from primary-source research, from scholarly writing, and how it privileges existing rather than new knowledge’ [20] [21] Wikipedia is not the place to redress fundamental issues with silences in the archives or in the profession overall, not least because on Wikipedia, primary research is bad and secondary sources are good [22] . This puts the onus back on to traditional academic publishing in peer-reviewed journals and books that can be cited in Wikipedia articles, though other published works such as ‘credible and authoritative books’ and ‘reputable media sources’ can also be cited.


‘A person is presumed to be notable if he or she has received significant coverage in reliable secondary sources that are independent of the subject. […] the person who is the topic of a biographical article should be “worthy of notice” – that is, “significant, interesting, or unusual enough to deserve attention or to be recorded” within Wikipedia as a written account of that person’s life.’ [23] ‘The common theme in the notability guidelines is that there must be verifiable, objective evidence that the subject has received significant attention from independent sources to support a claim of notability.’ [24] This creates obvious difficulties for some women’s histories.

It’s also difficult to judge where ‘notability’ should end. When does focusing on exceptional women become counter-productive? When do we risk creating a new canon? When does it stop being remarkable that a woman became prominent in a field and start being more accepted, if still not expected? [25] At what point should writing shift from individual entries to integration into more general topics?


Sometimes it’s hard to tell whether Wikipedia lags behind academia’s acceptance and general integration of women’s history into mainstream history or whether it is representative of the field’s more conservative corners. Recent digital history projects are doing a good job in explaining some of the issues with key sources for Wikipedia like the Oxford Dictionary of National Biography [26] , and I’d hope that this continues. As Martha Saxton said, ‘integrating women’s experience into broad subjects’ is ‘both more challenging intellectually and ultimately, more to the point of the overall project of bringing women into our acknowledged history’. [27]

But it’s also clearly up to us to make a difference. If it’s worth researching the life and achievements of a notable woman, it’s worth making sure their contribution to history is available to the world while improving the quality of the world’s biggest encyclopaedia. And it doesn’t mean going it alone. It’s still just Women’s History Month so it’s not too late to sign up and join one of the women’s history projects, or to plan something with your students. [28] [29] [30]

I’d like to close with quotes from two different women. Executive Director of the Wikimedia Foundation, Sue Gardner: ‘Wikipedia will only contain ‘the sum of all human knowledge’ if its editors are as diverse as the population itself: you can help make that happen. And I can’t think of anything more important to do, than that.’ [31]
And to quote Laura Mandell’s keynote yesterday: ‘Let’s write and publish about each other’s projects so that future historians will have those sources to write about. … Nothing changes through thinking alone, only through massive amounts of re-iteration’. [32]

[Update: based on questions afterwards, you may want to get started with Wikipedia:How to run an edit-a-thon, or sign up and say hello at Wikipedia:WikiProject Women’s History. You could also join in  the Global Women Wikipedia Write-In #GWWI on April 26 (1-3pm, US EST), and they have a handy page on How to Create Wikipedia Entries that Will Stick.

And update April 30, 2013: check out ‘Learning to work with Wikipedia – New Pages Patrol and how to create new Wikipedia articles that will stick‘ by the excellent Adrianne Wadewitz.

Update, June 9: if you’re thinking of setting a class assignment involving editing Wikipedia, check out their ‘For educators‘ and ‘Assignment Design‘ pages for tips and contact points.  June 18: see also Nicole Beale’s ‘Wikipedia for Regional Museums‘.

Update, August 21, 2013: content on Wikipedia appears to have had an additional boost in Google’s search results, making it even more important in shaping the world’s knowledge. More at ‘The Day the Knowledge Graph Exploded‘.

New link, February 2014: Jacqueline Wernimont’s Notes for #tooFEW Edit a thon based on a training session by Adrianne Wadewitz are a useful basic introduction to editing.]


[1] Various. ‘Wikipedia’. 2013. Wikipedia.
[5] Barnett, Fiona. 2013. ‘#tooFEW – Feminists Engage Wikipedia’. HASTAC. March 11.
[6] Gobry, Pascal-Emmanuel. 2011. ‘Wikipedia Is Hampered By Its Huge Gender Gap’. Business Insider. January 31.
[7] Cronon, William. 2012. ‘Scholarly Authority in a Wikified World’. Perspectives on History, American Historical Association. February 7.
[8] Saxton, Martha. 2012. ‘Wikipedia and Women’s History: A Classroom Experience’. Writing History in the Digital Age.
[9] Gobonobo. 2013. ‘User:Gobonobo/Gender Gap Red List’. Wikipedia.
[10] Various.. ‘Hans Reck’. Wikipedia.
[11] Mohr, B. A. R. 2010. Wives and daughters of early Berlin geoscientists and their work behind the scenes. Earth Sciences History 29 (2): 291–310.
[12] As commenter Operalala suggested, one challenge is recognising ‘the difference between the plurality of academia and the singularity of a Wikipedia article’. Comment on Messer-Kruse, Timothy. 2012. ‘The “Undue Weight” of Truth on Wikipedia’. The Chronicle of Higher Education. February 12.
[13] Rosenzweig, Roy. 2006. ‘Can History Be Open Source? Wikipedia and the Future of the Past’. The Journal of American History 93 (1) (June): 117–46.
[14] Operalala on Messer-Kruse, 2012 [15] Cronon, 2012.
[16] Potter, Claire. 2013. ‘Looking for the Women on Wikipedia: Readers Respond’. The Chronicle of Higher Education. March 14.
[18] Janet Abbate, “Guest Editor’s Introduction: Women and Gender in the History of Computing,” IEEE Annals of the History of Computing, vol. 25, no. 4, pp. 4-8, October-December, 2003
[19] Messer-Kruse, 2012.
[20] Anderson, Jill. 2013. ‘A Supposedly Fun Thing I’ll (Probably) Never Do Again’. True Stories Backward.
[21] Messer-Kruse, 2012.
[22] Various. 2013. ‘Wikipedia:No Original Research’. Wikipedia.
[23] Various. 2013. ‘Wikipedia:Notability (people)’. Wikipedia.
[24] Various. 2013. ‘Wikipedia:Notability’. Wikipedia.
[25] Or as Christie Aschwanden says when proposing the ‘Finkbeiner test’ for contemporary journalism about women in science, ‘treating female scientists as special cases only perpetuates the idea that there’s something extraordinary about a woman doing science’. Aschwanden, Christie. 2013. ‘The Finkbeiner Test’. Double X Science. March 5.
[26] For a recent example, see ‘An Entry of One’s Own, or Why Are There So Few Women In the Early Modern Social Network?’ 2013. Six Degrees of Francis Bacon. March 8. and ‘Gender and Name Recognition’. 2013. Six Degrees of Francis Bacon. March 20.
[27] Saxton, 2012
[29] Potter, Claire. 2013. ‘Prikipedia? Or, Looking for the Women on Wikipedia’. The Chronicle of Higher Education. March 10.
[30] For advice, see: Wikimedia Outreach. 2013. ‘Education Portal/Tips and Resources’. Wikipedia Outreach Wiki.
[31] A comment on Gardner, Sue. 2010. ‘Unlocking the Clubhouse: Five Ways to Encourage Women to Edit Wikipedia’. Sue Gardner’s Blog. November 14.
[32] Mandell, Laura. 2013. “Feminist Critique vs. Feminist Production in Digital Humanities.” Keynote presented at the Women’s History in the Digital World conference, Bryn Mawr College, Pennsylvania March 22 2013

Slow and still dirty Digital Humanities Australasia notes: day 3

These are my very rough notes from day 3 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Quick and dirty Digital Humanities Australasia notes: day 2) held in Canberra’s Australian National University at the end of March.

We were welcomed to Day 3 by the ANU’s Professor Marnie Hughes-Warrington (who expressed her gratitude for the methodological and social impact of digital humanities work) and Dr Katherine Bode.  The keynote was Dr Julia Flanders on ‘Rethinking Collections’, AKA ‘in praise of collections’… [See also Axel Brun’s live blog.]

She started by asking what we mean by a ‘collection’? What’s the utility of the term? What’s the cultural significance of collections? The term speaks of agency, motive, and implies the existence of a collector who creates order through selectivity. Sites like eBay, Flickr, Pinterest are responding to weirdly deep-seated desire to reassert the ways in which things belong together. The term ‘collection’ implies that a certain kind of completeness may be achieved. Each item is important in itself and also in relation to other items in the collection.

There’s a suite of expected activities and interactions in the genre of digital collections, projects, etc. They’re deliberate aggregations of materials that bear, demand individual scrutiny. Attention is given to the value of scale (and distant reading) which reinforces the aggregate approach…

She discussed the value of deliberate scope, deliberate shaping of collections, not craving ‘everythingness’. There might also be algorithmically gathered collections…

She discussed collections she has to do with – TAPAS, DHQ, Women Writers Online – all using flavours of TEI, the same publishing logic, component stack, providing the same functionality in the service of the same kinds of activities, though they work with different materials for different purposes.

What constitutes a collection? How are curated collections different to user-generated content or just-in-time collections? Back ‘then’, collections were things you wanted in your house or wanted to see in the same visit. What does the ‘now’ of collections look like? Decentralisation in collections ‘now’… technical requirements are part of the intellectual landscape, part of larger activities of editing and design. A crucial characteristic of collections is variety of philosophical urgency they respond to.

The electronic operates under the sign of limitless storage… potentially boundless inclusiveness. Design logic is a craving for elucidation, more context, the ability for the reader to follow any line of thought they might be having and follow it to the end. Unlimited informational desire, closing in of intellectual constraints. How do boundedness and internal cohesion help define the purpose of a collection? Deliberate attempt at genre not limited by technical limitations. Boundedness helps define and reflect philosophical purpose.

What do we model when we design and build digital collections? We’re modelling the agency through which the collection comes into being and is sustained through usage. Design is a collection of representational practices, item selection, item boundaries and contents. There’s a homogeneity in the structure, the markup applied to items. Item-to-item interconnections – there’s the collection-level ‘explicit phenomena’ – the directly comparable metadata through which we establish cross-sectional views through the collection (eg by Dublin Core fields) which reveal things we already know about texts – authorship of an item, etc. There’s also collection-level ‘implicit phenomena’ – informational commonalities, patterns that emerge or are revealed through inspection; change shape imperceptibly through how data is modelled or through software used [not sure I got that down right]; they’re always motivated so always have a close connection with method.

Readerly knowledge – what can the collection assume about what the reader knows? A table of contents is only useful if you can recognise the thing you want to find in it – they’re not always self-evident. How does the collection’s modelling affect us as readers? Consider the effects of choices on the intellectual ecology of the collection, including its readers. Readerly knowledge has everything to do with what we think we’re doing in digital humanities research.

The Hermeneutics of Screwing Around (pdf). Searching produces a dynamically located just-in-time collection… Search is an annoying guessing game with a passive-aggressive collection. But we prefer to ask a collection to show its hand in a useful way (i. e. browse)… Search -> browse -> explore.

What’s the cultural significance of collections? She referenced Liu’s Sidney’s Technology… A network as flow of information via connection, perpetually ongoing contextualisation; a patchwork is understood as an assemblage, it implies a suturing together of things previously unrelated. A patchwork asserts connections by brute force. A network assumes that connections are there to be discovered, connected to. Patchwork, mosaic – connects pre-existing nodes that are acknowledged to be incommensurable.

We avow the desirability of the network, yet we’re aware of the itch of edge cases, data that can’t be brought under rule. What do we treat as noise and what as signal, what do we deny is the meaning of the collection? Is exceptionality or conformance to type the most significant case? On twitter, @aylewis summarised this as ‘Patchworking metaphor lets us conceptualise non-conformance as signal not noise’

Pay attention to the friction in the system, rather than smoothing it over. Collections both express and support analysis. Expressing theories of genre etc in internal modelling… Patchwork – the collection articulates the scholarly interest that animated its creation but also interests of the reader… The collection is animated by agency, is modelled by it, even while it respects the agency we bring as readers. Scholarly enquiry is always a transaction involving agency on both ends.

My (not very good) notes from discussion afterwards… there was a question about digital femmage; discussion of the tension between the desire for transparency and the desire to permit many viewpoints on material while not disingenuously disavowing the roles in shaping the collection; the trend at one point for factoids rather than narratives (but people wanted the editors’ view as a foundation for what they do with that material); the logic of the network – a collection as a set of parameters not as a set of items; Alan Liu’s encouragement to continue with theme of human agency in understanding what collections are about (e.g. solo collectors like John Soane); crowdsourced work is important in itself regardless of whether it comes up with the ‘best’ outcome, by whatever metric. Flanders: ‘the commitment to efficiency is worrisome to me, it puts product over people in our scale of moral assessment’ [hoorah! IMO, engagement is as important as data in cultural heritage]; a question about the agency of objects, with the answer that digital surrogates are carriers of agency, the question is how to understand that in relation to object agency?

GIS and Mapping I

The first paper was ‘Mapping the Past in the Present’ by Andrew Wilson, which was a fast run-through some lovely examples based on Sydney’s geo-spatial history. He discussed the spatial turn in history, and the mid-20thC shift to broader scales, territories of shared experience, the on-going concern with the description of space, its experience and management.

He referenced Deconstructing the map, Harley, 1989, ‘cartography is seldom what the cartographers say it is’. All maps are lies. All maps have to be read, closely or distantly. He referenced Grace Karskens’ On the rocks and discussed the reality of maps as evidence, an expression of European expansion; the creation of the maps is an exercise in power. Maps must be interpreted as evidence. He talked about deriving data from historic maps, using regressive analysis to go back in time through the sources. He also mentioned TGIS – time-enabled GIS. Space-time composite model – when have lots and lots of temporal changes, create polygon that describes every change in the sequence.

The second paper was ‘Reading the Text, Walking the Terrain, Following the Map: Do We See the Same Landscape?’ by Øyvind Eide. He said that viewing a document and seeing a landscape are often represented as similar activities… but seeing a landscape means moving around in it, being an active participant. Wood (2010) on the explosion of maps around 1500 – part of the development of the modern state. We look at older maps through modern eyes – maps weren’t made for navigation but to establish the modern state.

He’s done a case study on text v maps in Scandinavia, 1740s. What is lost in the process of converting text to maps? Context, vagueness, under-specification, negation, disjunction… It’s a combination of too little and too much. Text has information that can’t fit on a map and text that doesn’t provide enough information to make a map. Under-specification is when a verbal text describes a spatial phenomenon in a way that can be understood in two different ways by a competent reader. How do you map a negative feature of a landscape? i.e. things that are stated not to be there. ‘Or’ cannot be expressed on a map… Different media, different experiences – each can mediate only certain aspects for total reality (Ellestrom 2010).

The third paper was ‘Putting Harlem on the Map’ by Stephen Robertson. This article on ‘Writing History in the Digital Age’ is probably a good reference point: Putting Harlem on the Map, the site is at Digital Harlem. The project sources were police files, newspapers, organisational archives… They were cultural historians, focussed on individual level data, events, what it was like to live in Harlem. It was one of first sites to employ geo-spatial web rather than GIS software. Information was extracted and summarised from primary sources, [but] it wasn’t a digitisation project. They presented their own maps and analysis apart from the site to keep it clear for other people to do their work.  After assigning a geo-location it is then possible to compare it with other phenomena from the same space. They used sources that historians typically treat as ephemera such as society or sports pages as well as the news in newspapers.

He showed a great list of event types they’ve gotten from the data… Legal categories disaggregate crime so it appears more often in the list though was the minority of data. Location types also offers a picture of the community.

Creating visualisations of life in the neighbourhood…. when mapping at this detailed scale they were confronted with how vague most historical sources are and how they’re related to other places. ‘Historians are satisfied in most cases to say that a place is ‘somewhere in Harlem’.’ He talked about visualisations as ‘asking, but not explaining, why there?’.

I tweeted that I’d gotten a lot more from his demonstration of the site than I had from looking at it unaided in the past, which lead to a discussion with @claudinec and @wragge about whether the ‘search vs browse’ accessibility issue applies to geospatial interfaces as well as text or images (i.e. what do you need to provide on the first screen to help people get into your data project) and about the need for as many hooks into interfaces as possible, including narratives as interfaces.

Crowdsourcing was raised during the questions at the end of the session, but I’ve forgotten who I was quoting when I tweeted, ‘by marginalising crowdsourcing you’re marginalising voices’, on the other hand, ‘memories are complicated’.  I added my own point of view, ‘I think of crowdsourcing as open source history, sometimes that’s living memory, sometimes it’s research or digitisation’.  If anything, the conference confirmed my view that crowdsourcing in cultural heritage generally involves participating in the same processes as GLAM staff and humanists, and that it shouldn’t be exploitative or rely on user experience tricks to get participants (though having made crowdsourcing games for museums, I obviously don’t have a problem with making the process easier to participate in).

The final paper I saw was Paul Vetch, ‘Beyond the Lowest Common Denominator: Designing Effective Digital Resources’. He discussed the design tensions between: users, audiences (and ‘production values’); ubiquity and trends; experimentation (and failure); sustainability (and ‘the deliverable’),

In the past digital humanities has compartmentalised groups of users in a way that’s convenient but not necessarily valid. But funding pressure to serve wider audiences means anticipating lots of different needs. He said people make value judgements about the quality of a resource according to how it looks.

Ubiquity and trends: understanding what users already use; designing for intuition. Established heuristics for web design turn out to be completely at odds with how users behave.

Funding bodies expect deliverables, this conditions the way they design. It’s difficult to combine: experimentation and high production values [something I’ve posted on before, but as Vetch said, people make value judgements about the quality of a resource according to how it looks so some polish is needed]; experimentation and sustainability…

Who are you designing for? Not the academic you’re collaborating with, and it’s not to create something that you as a developer would use. They’re moving away from user testing at the end of a project to doing it during the project. [Hoorah!]

Ubiquity and trends – challenges include a very highly mediated environment; highly volatile and experimental… Trying to use established user conventions becomes stifling. (He called ‘old nonsense’!) The ludic and experiential are increasingly important elements in how we present our research back.

Mapping Medieval Chester took technology designed for delivering contextual ads and used it to deliver information in context without changing perspective (i.e. without reloading the page, from memory).  The Gough map was an experiment in delivering a large image but also in making people smile.  Experimentation and failure… Online Chopin Variorum Edition was an experiment. How is the ‘work’ concept challenged by the Chopin sources? Technical methodological/objectives: superimposition; juxtaposition; collation/interpolation…

He discussed coping strategies for the Digital Humanities: accept and embrace the ephemerality of web-based interfaces; focus on process and experience – the underlying content is persistent even if the interfaces don’t last.  I think this was a comment from the audience: ‘if a digital resource doesn’t last then it breaks the principle of citation – where does that leave scholarship?’


So those are my notes.  For further reference I’ve put a CSV archive of #DHA2012 tweets from here, but note it’s not on Australian time so it needs transposing to match the session times.

This was my first proper big Digital Humanities conference, and I had a great time.  It probably helped that I’m an Australian expat so I knew a sprinkling of people and had a sense of where various institutions fitted in, but the crowd was also generally approachable and friendly.

I was also struck by the repetition of phrases like ‘the digital deluge’, the ‘tsunami of data’ – I had the feeling there’s a barely managed anxiety about coping with all this data. And if that’s how people at a digital humanities conference felt, how must less-digital humanists feel?

I was pleasantly surprised by how much digital history content there was, and even more pleasantly surprised by how many GLAMy people were there, and consequently how much the experience and role of museums, libraries and archives was reflected in the conversations.  This might not have been as obvious if you weren’t on twitter – there was a bigger disconnect between the back channel and conversations in the room than I’m used to at museum conferences.

As I mentioned in my day 1 and day 2 posts, I was struck by the statement that ‘history is on a different evolutionary branch of digital humanities to literary studies’, partly because even though I started my PhD just over a year ago, I’ve felt the title will be outdated within a few years of graduation.  I can see myself being more comfortable describing my work as ‘digital history’ in future.

I have to finish by thanking all the speakers, the programme committee, and in particular, Dr Paul Arthur and Dr Katherine Bode, the organisers and the aaDH committee – the whole event went so smoothly you’d never know it was the first one!

And just because I loved this quote, one final tweet from @mikejonesmelb: Sir Ken Robinson: ‘Technology is not technology if it was invented before you were born’.

Designing for participatory projects: emergent best practice, getting discussion started

I was invited over to New Zealand (from Australia) recently to talk at Te Papa in Wellington and the Auckland Museum.  After the talks I was asked if I could share some of my notes on design for participatory projects and for planning for the impact of participatory projects on museums.  Each museum has a copy of my slides, but I thought I’d share the final points here rather than by email, and take the opportunity to share some possible workshop activities to help museums plan audience participation around its core goals.

Both talks started by problematising the definition of a ‘museum website’ – it doesn’t work to think of your ‘museum website’ as purely stuff that lives under your domain name when it’s now it’s also the social media accounts under your brand, your games and mobile apps, and maybe also your objects and content on Google Art Project or even your content in a student’s Tumblr.  The talks were written to respond to the particular context of each museum so they varied from there, but each ended up with these points.  The sharp-eyed among you might notice that they’re a continuation of ideas I first shared in my Europeana Tech keynote: Open for engagement: GLAM audiences and digital participation.  The second set are particularly aimed at helping museums think about how to market participatory projects and sustain them over the longer term by making them more visible in the museum as a whole.

Best practice in participatory project design

  • Have an answer to ‘Why would someone spend precious time on your project?’
  • Be inspired by things people love
  • Design for the audience you want
  • Make it a joy to participate
  • Don’t add unnecessary friction, barriers (e.g. don’t add sign-up forms if you don’t really need them, or try using lazy registration if you really must make users create accounts)
  • Show how much you value contributions (don’t just tell people you value their work)
  • Validate procrastination – offer the opportunity to make a difference by providing meaningful work
  • Provide an easy start and scaffolded tasks (see e.g. Nina Simon’s Self-Expression is Overrated: Better Constraints Make Better Participatory Experiences)
  • Let audiences help manage problems – let them know which behaviours are acceptable and empower them to keep the place tidy
  • Test with users; iterate; polish

Best practice within your museum

  • Fish where the fish are – find the spaces where people are already engaging with similar content and see how you can slot in, don’t expect people to find their way to you unless you have something they can’t find anywhere else
  • Allow for community management resources – you’ll need some outreach to existing online and offline communities to encourage participation, some moderation and just a general sense that the site hasn’t been abandoned. If you can’t provide this for the life of the project, you might need to question why you’re doing it.
  • Decide where it’s ok to lose control. Try letting go… you may find audiences you didn’t expect, or people may make use of your content in ways you never imagined. Watch and learn and tweak in response – this is a good reason to design in iterations, and to go into public or invited-beta earlier rather than later. 
  • Realistically assess fears, decide acceptable levels of risk. Usually fears can be turned into design requirements, they’re rarely show-stoppers.
  • Have a clear objective, ideally tied to your museum’s mission. Make sure the point of the project is also clear to your audience.
  • Put the audience needs first. You’re asking people to give up their time and life experience, so make sure the experience respects this. Think carefully before sacrificing engagement to gain efficiency.
  • Know how to measure success
  • Plan to make the online activity visible in the organisation and in the museum. Displaying online content in the museum is a great way to show how much you value it, as well as marketing the project to potential contributors.  Working out how you can share the results with the rest of the organization helps everyone understand how much potential there is, and helps make online visitors ‘real’.
  • Have an exit strategy – staff leave, services fold or change their T&Cs

I’d love to know what you think – what have I missed?  [Update: for some useful background on the organisational challenges many museums face when engaging with technology, check out Collections Access and the use of Digital Technology (pdf).]

More on designing museum projects for audience participation

I prepared this activity for one of the museums, but on the day the discussion after my talk went on so long that we didn’t need to use a formal structure to get people talking. In the spirit of openness, I thought I’d share it. If you try it in your organisation, let me know how it goes!

The structure – exploratory idea generation followed by convergence and verification – was loosely based on the ‘creativity workshops’ developed by City University’s Centre for Creativity (e.g. the RESCUE creativity workshops discussed in Use and Influence of Creative Ideas and Requirements for a Work-Integrated Learning System).  It’s designed to be a hackday-like creative activity for non-programmers.

In small groups…

  • Pick two strategic priorities or organisational goals…
  • In 5 minutes: generate as many ideas as possible
  • In 2 minutes: pick one idea to develop further

Ideas can include in-gallery and in-person activity; they must include at least two departments and some digital component.

Developing your idea…
Ideas can include in-gallery and in-person activity; they must include at least two departments

  • You have x minutes to develop your idea
  • You have 2 minutes each to report back. Include: which previous museum projects provide relevant lessons? How will you market it? How will it change the lives of its target audience? How will it change the museum?
  • How will you alleviate potential risks?  How will you maximise potential benefits?
  • You have x minutes for general discussion. How can you build on the ideas you’ve heard?

For bonus points…

These discussion points were written for another museum, but they might be useful for other organisations thinking about audience participation and online collections:

What are the museum’s goals in engaging audiences with collections online?

  • What does success look like?
  • How will it change the museum?
  • Which past projects provide useful lessons?

How can the whole organisation be involved in supporting online conversations?

  • What are the barriers?
  • What small, sustainable steps can be taken?
  • Where are online contributions visible in the museum?

Quick and dirty Digital Humanities Australasia notes: day 2

What better way to fill in stopover time in Abu Dhabi than continuing to post my notes from DHA2012? [Though I finished off the post and re-posted once I was back home.] These are my very rough notes from day 2 of the inaugural Australasian Association for Digital Humanities conference (see also Quick and dirty Digital Humanities Australasia notes: day 1 and Slow and still dirty Digital Humanities Australasia notes: day 3). In the interests of speed I’ll share my notes and worry about my own interpretations later.

Keynote panel, ‘Big Digital Humanities?’

Day 2 was introduced by Craig Bellamy, and began with a keynote panel with Peter Robinson, Harold Short and John Unsworth, chaired by Hugh Craig. [See also Snurb’s liveblogs for Robinson, Short and Unsworth.] Robinson asked ‘what constitutes success for the digital humanities?’ and further, what does the visible successes of digital humanities mask? He said it’s harder for scholars to do high quality research with digital methods now than it was 20 years ago. But the answer isn’t more digital humanists, it’s having the ingredients to allow anyone to build bridges… He called for a new generation of tools and methods to support the scholarship that people want to do: ‘It should be as easy to make a digital edition (of a document/book) as it is to make a Facebook page’, it shouldn’t require collaboration with a digital humanist. To allow data made by one person to be made available to others, all digital scholarship should be made available under a Creative Commons licence (publishers can’t publish it now if it’s under a non-commercial licence), and digital humanities data should be structured and enriched with metadata and made available for re-use with other tools. The model for sustainability depends on anyone and everyone being able to access data.

Harold Short talked about big (or at least unescapable) data and the ‘Svensson challenge’ – rather than trying to work out how to take advantage of infrastructure created by and for the sciences, use your imagination to figure out what’s needed for the arts and humanities. He called for a focus on infrastructure and content rather than ‘data’.

John Unsworth reminded us that digital humanities is a certain kind of work in the humanities that uses computational methods as its research methods. It’s not just using digital materials, though it does require large collections of data – it also requires a sense of how how the tools work.

What is the digital humanities?

Very different versions of ‘digital humanities’ emerged through the panel and subsequent discussion, leaving me wondering how they related to the different evolutionary paths of digital history and digital literature studies mentioned the day before. Meanwhile, on the back channel (from the tweets that are to hand), I wondered if a two-tier model of digital humanities was emerging – one that uses traditional methods with digital content (DH lite?); another that disrupts traditional methods and values. Though thinking about it now, the ‘tsunami’ of data mentioned is disruptive in its own right, regardless of the intentional choices one makes about research practices (which might have been what Alan Liu meant when he asked about ‘seamless’ and ‘seamful’ views of the world)…. On twitter, other people (@mikejonesmelb, @bestqualitycrab, @1n9r1d) wondered if the panel’s interpretation of ‘big’ data was gendered, generational, sectoral, or any other combination of factors (including as the messiness and variability of historical data compared to literature) and whether it could have been about ‘disciplinary breadth and inclusiveness‘ rather than scale.

Data morning session

The first speaker was Toby Burrows on ‘Using Linked Data to Build Large‐Scale e‐Research Environments for the Humanities’. [Update: he’s shared his slides and paper online and see also Snurb’s liveblog.] Continuing some of the themes from the morning keynote panel, he said that the humanities has already been washed away in the digital deluge, the proliferation of digital stuff is beyond the capacity of individual researchers. It’s difficult to answer complex humanities questions only using search with this ‘industrialised’ humanities data, but large-scale digital libraries and collections offer very little support for functions other than search. There’s very little connection between data that researchers are amassing and what institutions are amassing.

He’s also been looking at historians/humanists research practices [and selfishly I was glad to see many parallels with my own early findings]. The tools may be digital rather than paper and scissors, but historians are still annotating and excerpting as they always have. The ‘sharing’ part of their work has changed the most – it’s easier to share, and they can share at an earlier stage if they choose to do that, but not a lot has changed at the personal level.

Burrows said applying applying linked data approach to manuscript research would go a long way to addressing the complexity of the field. For example, using global URIs for manuscripts and parts; separating names and concepts from descriptive information; and using linked data functions to relate scholarly activities (annotations, excerpts, representations etc) to manuscript descriptions, objects and publications. Linked data can provide a layer of entities that sits between research activities and descriptions/collections/publications, which avoids conflating the entities and the source material. Multiple naming schemes are necessary for describing entities and relationships – there’s no single authoritative vocabulary. It’s a permanent work in progress, with no definitive or final structure. Entities need to include individuals as well as categories, with a network graph showing relatedness and the evidence for that relatedness as the basic structure.

He suggested a focus on organising knowledge, not collections, whether objects or texts. Collaborative activities should be based around this knowledge, using tools that work with linked data entities. This raised the issue of contested ground and the application of labels and meaning to data: your ‘discovery’ is my ‘invasion’. This makes citizen humanities problematic – who gets to describe, assign, link, and what does that mean for scholarly authority?

My notes aren’t clear but I think Burrows said these ideas were based on analysis of medieval manuscript research, which Jane Hunter had also worked on, and they were looking towards the architecture for HuNI. It was encouraging to see an approach to linked data so grounded in the complexity of historians research practices and data, and is yet another reason I’m looking forward to following HuNI’s progress – I think it will have valuable lessons for linked data projects in the rest of the world. [These slides from the Linked Open Data workshop in Melbourne a few weeks later show the academic workflow HuNI plans to support and some of the issues they’ll have to tackle.]

The second speaker was the University of Sydney’s Stephen Hayes on ‘how linked is linked enough?’. [See also Snurb’s liveblog.] He’s looking at projects through a linked data lens, trying to assess how much further projects need to go to comfortably claim to be linked data. He talked about the issues projects encountered trying to get to be 5 star Linked Data.

He looked at projects like the Dictionary of Sydney, which expresses data as RDF as well in a public-facing HTML interface and comes close to winning 5 stars. It is a demonstration of the fact that once data is expressed in one form, it can be easily expressed in another form – stable entities can be recombined to form new structures. The project is powered by Heurist, a tool for managing a wide range of research data. The History of Balinese Painting could not find other institutions that exposed Balinese collection data in programmable form so they could link to them (presumably a common problem for early adopters but at least it helps solve the ‘chicken or the egg’ problem that dogs linked data in cultural heritage and the humanities). The sites URLs don’t return useful metadata but they do try to refer to image URLs so it’s ‘sorta persistent’. He gave it a rating of 3.5 stars. Other projects mentioned (also built on Heurist?) were the Charles Harpur Critical Archive, rated at 3.5 stars and Virtual Zagora, rated at 3 stars.

The paper was an interesting discussion of the team work required to get the full 5 stars of linked data, and the trade-offs in developing functions for structured data (e.g. implementing’s painting markup versus focussing on the quality of the human-facing pages); reassuring curators about how much data would be released and what would be kept back; developing ontologies throughout a project or in advance and the overhead in mapping other projects concepts to their own version of Dublin Core.

The final paper in the session was ‘As Curious An Entity: Building Digital Resources from Context, Records and Data’ by Michael Jones and Antonina Lewis (abstract). [See also Snurb’s liveblog.] They said that improving the visibility of relationships between entities enriches archives, as does improving relationships between people. The title quote in full is ‘as curious an entity as bullshit writ on silk’ – if the parameters, variables and sources of data are removed from material, then it’s just bullshit written on silk. Visualisations remove sources, complexity and ‘relative context’, and would be richer if they could express changes in data over time and space. They asked how one would know that information presented in a visualisation is accurate if it doesn’t cite sources? You must seek and reference original material to support context layers.

They presented an overview of the Saulwick Archive project (Saulwick ran polls for the Fairfax newspapers for years) and the Australian Women’s Register, discussed common issues faced in digital humanities, and the role of linked data and human relationships in building digital resources. They discussed the value of maintaining relationships between archives and donors after the transfer of material, and the need to establish data management plans to make provision for raw data and authoritative versions of related contextual material, and to retain data to make sense of the archives in the future. The Australian Women’s Register includes content written for the site and links out to the archival repositories and libraries where the records are held. In a lovely phrase, they described records as the ‘evidential heart’ for the context and data layers. They also noted that the keynote overlooked non-academic re-use of digital resources, but it’s another argument for making data available where possible.

Digital histories session

The first paper was ‘Community Connections: The Renaissance of Local History’ by Lisa Murray. Murray discussed the ‘three Cs’ needed for local history: connectivity, community, collaboration.

Is the process of geo-referencing forcing historians to be more specific about when or where things happened? Are people going from the thematic to the particular? Is it exciting for local historians to see how things fit into state or national narratives? Digital history has enormous potential for local and family history and to represent complicated relationships within a community and how they’ve changed over time. Digital history doesn’t have to be article-centric – it enables new forms of presentation. Historians have to acknowledge that Wikipedia is aligned to historians’ processes. Local history is strongly represented on Wikipedia. The Dictionary of Sydney provides a universal framework for accessing Sydney’s history.

The democratisation of historical production is exciting but raises it challenges for public understandings of how history undertaken and represented. Are some histories privileged? Making History (a project by Museum Victoria and Monash University) encourages the use of online resources but does that privilege digitised sources, and will others be neglected? Are easily accessible sources privileged, and does that change what history is written? What about community collections or vast state archives that aren’t digitised?

History research methodologies are changing – Google etc is shaping how research is undertaken; the ubiquity of keyword searching reinforces the primacy of names. She noted the impact of family historians on how archives prioritise work. It’s not just about finding sources – to produce good history you need to analyse the sources. Professional historians are no longer the privileged producers of knowledge. History can be parochial, inclusive, but it can also lack sense of historical perspective, context. Digital history production amplifies tensions between popular history and academic history [and presumably between amateur and academic historians?].

Apparently primary school students study more local history than university students do. Local and community history is produced by broad spectrum of community but relatively few academic historians are participating. There’s a risk of favouring quirky facts over significance and context. Unless history is more widely taught, local history will be tarred with same brush as antiquarians. History is not only about narrative and context… Historians need to embrace the renaissance of local and community history.

In the questions there was some discussion of the implications of Sydney’s city archives being moved to a more inconvenient physical location. The justification is that it’s available through Ancestry but that removes it from all context [and I guess raises all the issues of serendipity etc in digital vs physical access to archives].

The next speaker was Tim Sherratt on ‘Inside the bureaucracy of White Australia’. His slides are online and his abstract is on the Invisible Australians site. The Invisible Australians project is trying to answer the question of what the White Australia policy looked like to a non-white Australian.  He talked about how digital technology can help explore the practice of exclusion as legislation and administrative processes were gradually elaborated. Chinese Australians who left Australia and wanted to return had to prove both their identity and their right to land to convince officials they could return: ‘every non-white resident was potentially a prohibited immigrant just waiting to be exposed’. He used topic modelling on file titles from archival series and was able to see which documents related to the White Australia policy. This is a change from working through hierarchical structures of archives to working directly through the content of archives. This provides a better picture of what hasn’t survived, what’s missing and would have many other exciting uses. [His post on Topic modelling in the archives explains it better than my summary would.]

The final paper was Paul Turnbull on ‘Pancake history’. He noted that in e-research there’s a difference between what you can use in teaching and what makes people nervous in the research domain. He finds it ironic that professional advancement for historians is tied to writing about doing history rather than doing history. He talked about the need to engage with disciplinary colleagues who don’t engage with digital humanities, and issues around historians taking digital history seriously.

Sherratt’s talk inspired discussion of funding small-scale as well as large-scale infrastructure, possibly through crowdfunding. Turnbull also suggested ‘seeding ideas and sharing small apps is the way to go’.

[Note from when I originally posted this: I don’t know when my flight is going to be called, so I’ll hit publish now and keep working until I board – there’s lots more to fit in for day 2! In the afternoon I went to the ‘Digital History’ session. I’ll tidy up when I’m in the UK as I think blogger is doing weird LTR things because it may be expecting Arabic.]

See also Slow and still dirty Digital Humanities Australasia notes: day 3.

Quick and dirty Digital Humanities Australasia notes: day 1

As always, I should have done this sooner and tidied them up more, but better rough notes than nothing, so here goes… The Australasian Association for Digital Humanities held their inaugural conference in Canberra in March, 2012.  You can get an overall sense of the conference from the #DHA2012 tweets (I’ve put a CSV archive of #DHA2012 tweets from here, but note it’s not on Australian time) and from the keynotes.

In his opening keynote on the movements between close and distant reading, Alan Liu observed that the crux of the ‘reading’ issue depends on the field, and further, that ‘history is on a different evolutionary branch of digital humanities to literary studies’.  This is something I’ve been wondering about since finding myself back in digital humanities, and was possibly reflected in the variety of papers in the overall programme.  I was generally following sessions on digital history, geospatial themes and crowdsourcing, but there was so much in the programme that you could have followed a literary studies line and had a totally different conference experience.

In the next session I went to a panel on ‘Connecting Australia’s Cultural Datasets: A Vision for Collaboration’ with various people from the new ‘Humanities Networked Infrastructure’ (HuNI) (more background) presenting.  It started with Deb Verhoeven on ‘jailbreaking cultural data’ and the tension identified by Brand: “information wants to be expensive because it’s so valuable.  The right information in the right place just changes your life.  On the other hand, information wants to be free, because the cost of getting it out is lower and lower all the time. So you have these two things fighting against each other”. ‘Information wants to be social’: she discussed the need to understand the value of research in terms of community engagement, not just as academically ranked output, and to return research to the communities they’re investigating in meaningful ways.
Other statements that resonated were the need for organisational, semantic and technical interoperability in datasets to create collaborative environments. Collaboration requires data integration and exchange as well as dealing with different ideas about what ‘data’ is in different disciplines in the humanities. Collaboration in the cultural datasets community can follow unmet needs: discover data that’s currently hidden, make connections between disparate data sources, publish and share connections.

Ross Harley talked about how interoperability facilitates serendipity and trying to find new ways for data to collide. In the questions, Ingrid Mason asked about parallels with the GLAM (galleries, libraries, archives and museums) community, but it was also pointed out that GLAMs are behind in publishing their data – not everything HuNI wants to use is available yet.  I pointed out (on the twitter back channel) that requests for GLAM information from intensive users (e.g. researchers) helps memory institutions make the case for publishing more data – it’s still all a bit chicken-or-the-egg.

After lunch I went to the crowdsourcing session (not least cos I was presenting early results from my PhD in it).  The first presentation was on ‘crowdsourcing semantic tags on 3D museum artefacts’ which could have amazing applications for teaching material culture and criticism as well as source communities because it lets people annotate specific locations on a 3D model. Interestingly, during the questions someone reported people visiting campus classics museum who said they were enjoying seeing the objects in person but also wanted access to electronic versions – it’s fascinating watching audience expectations change.

The next presentation was on ‘Optimising crowdsourcing websites to increase volunteer participation’ which was a case study of NYPL’s What’s on the menu by Donelle McKinley who was using MECLAB/Flint McGlaughlin’s Conversion Sequence heuristic (clarity of value proposition, motivation, incentive, friction, anxiety) to assess how the project’s design was optimised to motivate audience participation.  Donelle’s analysis is really useful for people thinking about designing for crowdsourcing, but I’m not sure my notes do it justice, and I’m afraid I didn’t get many notes for Pauline Cockrill’s ‘Using Web 2.0 to make new connections in community history’ as I was on just afterwards.  One point I tweeted was about a quick win for crowdsourcing in using real-world communities as pointers to successful online collaborations, but I’m not sure now who said it.

One comment I noted during the discussion was “a real pain about Old Weather was that you’d get into working on a ship and it would just sail off on you” – interfaces that work for the organisation doesn’t always work for the audience.  This session was generally useful for clarifying my thoughts on the tension between optimising for efficiency or engagement in cultural heritage crowdsourcing projects.

In the interests of getting this posted I’ll stop here and call this ‘day 1’. I’m not sure if any of the slides are available yet, but I’ll update and link to any presentations or other write-ups I find. There’s a live blog of many sessions at

[Update: I’ve posted about Day 2 at Quick and dirty Digital Humanities Australasia notes: day 2 and Slow and still dirty Digital Humanities Australasia notes: day 3.]