April news in crowdsourcing, citizen science, citizen history

Another quick post with news on crowdsourcing in cultural heritage, citizen science and citizen history in April(ish) 2016…

Acceptances for our DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing? have been sent out. If you missed the boat, don’t panic! We’re taking a few more applications on a rolling basis to allow for people with late travel approval for the DH2016 conference in July.

Probably the biggest news is the launch of citizenscience.gov, as it signals the importance of citizen science and crowdsourcing to the US government.

From the press release: ‘the White House announced that the U.S. General Services Administration (GSA) has partnered with the Woodrow Wilson International Center for Scholars (WWICS), a Trust instrumentality of the U.S. Government, to launch CitizenScience.gov as the new hub for citizen science and crowdsourcing initiatives in the public sector.

CitizenScience.gov provides information, resources, and tools for government personnel and citizens actively engaged in or looking to participate in citizen science and crowdsourcing projects. … Citizen science and crowdsourcing are powerful approaches that engage the public and provide multiple benefits to the Federal government, volunteer participants, and society as a whole.’

There’s also work to ‘standardize data and metadata related to citizen science, allowing for greater information exchange and collaboration both within individual projects and across different projects’.

Other news:

Responses to questions about if the volunteers agreed that the Zooniverse… From Science Learning via Participation in Online Citizen Science

Have I missed something important? Let me know in the comments or @mia_out.

SXSW, project anniversaries and more – news on heritage crowdsourcing

Photo of programme
Our panel listing at SXSW

I’ve just spent two weeks in Texas, enjoying the wonderful hospitality and probing questions after giving various talks at universities in Houston and Austin before heading to SXSW. I was there for a panel on ‘Build the Crowdsourcing Community of Your Dreams’ (link to our slides and collected resources) with Ben Brumfield, Siobhan Leachman, and Meghan Ferriter. Siobhan, a ‘super-volunteer’ in more ways than one, posted her talk notes on ‘How cultural institutions encouraged me to participate in crowdsourcing & the factors I consider before donating my time‘.

In other news, we (me, Ben, Meghan and Christy Henshaw from the Wellcome Library) have had a workshop accepted for the Digital Humanities 2016 conference, to be held in Kraków in July. We’re looking for people with different kinds of expertise for our DH2016 Expert Workshop: Beyond The Basics: What Next For Crowdsourcing?.  You can apply via this form.

One of the questions at our SXSW panel was about crowdsourcing in teaching, which reminded me of this recent post on ‘The War Department in the Classroom‘ in which Zayna Bizri ‘describes her approach to using the Papers of the War Department in the classroom and offers suggestions for those who wish to do the same’. In related news, the PWD project is now five years old! There’s also this post on Primary School Zooniverse Volunteers.

The Science Gossip project is one year old, and they’re asking their contributors to decide which periodicals they’ll work on next and to start new discussions about the documents and images they find interesting.

The History Harvest project have released their Handbook (PDF).

The Danish Nationalmuseet is having a ‘Crowdsource4dk‘ crowdsourcing event on April 9. You can also transcribe Churchill’s WWII daily appointments, 1939 – 1945 or take part in Old Weather: Whaling (and there’s a great Hyperallergic post with lots of images about the whaling log books).

I’ve seen a few interesting studentships and jobs posted lately, hinting at research and projects to come. There’s a funded PhD in HCI and online civic engagement and a (now closed) studentship on Co-creating Citizen Science for Innovation.

And in old news, this 1996 post on FamilySearch’s collaborative indexing is a good reminder that very little is entirely new in crowdsourcing.

From grey dots to trenches to field books – news in heritage crowdsourcing

Apparently you can finish a thesis but you can’t stop scanning for articles and blog posts on your topic. Sharing them here is a good way to shake the ‘I should be doing something with this’ feeling.* This is a fairly random sample of recent material, but if people find it useful I can go back and pull out other things I’ve collected.

Victoria Van Hyning, ‘What’s up with those grey dots?’ you ask – brief blog post on using software rather than manual processes to review multiple text transcriptions, and on the interface challenges that brings.

Melissa Terras, ‘Crowdsourcing in the Digital Humanities‘ – pre-print PDF for a chapter in A New Companion to Digital Humanities.

Richard Grayson, ‘A Life in the Trenches? The Use of Operation War Diary and Crowdsourcing Methods to Provide an Understanding of the British Army’s Day-to-Day Life on the Western Front‘ – a peer-reviewed article based on data created through Operation War Diary.

The Impact of Coordinated Social Media Campaigns on Online Citizen Science Engagement – a poster by Lesley Parilla and Meghan Ferriter reported on the Biodiversity Heritage Library blog.

The Impact of Coordinated Social Media Campaigns on Online Citizen Science Engagement

Ben Brumfield, Crowdsourcing Transcription Failures – a response to a mailing list post asking ‘where are the failures?’

And finally, something related to my interest in participatory history commonsMartin Luther King Jr. Memorial Library – Central Library launches Memory Lab, a ‘DIY space where you can digitize your home movies, scan photographs and slides, and learn how to care for your physical and digital family heirlooms’. I was so excited when I about this project – it’s addressing such important issues. Jaime Mears is blogging about the project.

 

* How long after a PhD does it take for that feeling to go? Asking for a friend.

The state of museum technology?

On Friday I was invited to Nesta‘s Digital Culture Panel event to respond to their 2015 Digital Culture survey on ‘How arts and cultural organisations in England use technology’ (produced with Arts Council England (ACE) and the Arts and Humanities Research Council (AHRC)). As Chair of the Museums Computer Group (MCG) (a practitioner-led group of over 1500 museum technology professionals), I’ve been chatting to other groups about the gap between the digital skills available and those needed in the museum sector, so it’s a subject close to my heart. In previous years I’d noted that the results didn’t seem to represent what I knew of museums and digital from events and working in the sector, so I was curious to see the results.

Digital Culture 2015 imageSome of their key findings for museums (PDF) are below, interspersed with my comments. I read this section before the event, and found I didn’t really recognise the picture of museums it presented. ‘Museums’ mightn’t be the most useful grouping for a survey like this – the material that MTM London’s Ed Corn presented on the day broke the results down differently, and that made more sense. The c2,500 museums in the UK are too varied in their collections (from dinosaurs to net art), their audiences, and their local and organisational context (from tiny village museums open one afternoon a week, to historic houses, to university museums, to city museums with exhibitions that were built in the 70s, to white cube art galleries, to giants like the British Museum and Tate) to be squished together in one category. Museums tend to be quite siloed, so I’d love to know who fills out the survey, and whether they ask the whole organisation to give them data beforehand.

According to the survey, museums are significantly less likely to engage in:

  • email marketing (67 per cent vs. 83 per cent for the sector as a whole) – museums are missing out! Email marketing is relatively cheap, and it’s easy to write newsletters. It’s also easy to ask people to sign up when they’re visiting online sites or physical venues, and they can unsubscribe anytime they want to. Social media figures can look seductively huge, but Facebook is a frenemy for organisations as you never know how many people will actually see a post.
  • publish content to their own website (55 per cent vs. 72 per cent) – I wasn’t sure how to interpret this – does this mean museums don’t have their own websites? Or that they can’t update them? Or is ‘content’ a confusing term? At the event it was said that 10% of orgs have no email marketing, website or Facebook, so there are clearly some big gaps to fill still.
  • sell event tickets online (31 per cent vs. 45 per cent) – fair enough, how many museums sell tickets to anything that really need to be booked in advance?
  • post video or audio content (31 per cent vs. 43 per cent) – for most museums, this would require an investment to create as many don’t already have filmable material or archived films to hand. Concerns about ‘polish’ might also be holding some museums back – they could try periscoping tours or sharing low-fi videos created by front of house staff or educators. Like questions about offering ‘online interactive tours of real-world spaces’ and ‘artistic projects’, this might reflect initial assumptions based on ACE’s experience with the performing arts. A question about image sharing would make more sense for museums. Similarly, the kinds of storytelling that blog posts allow can sometimes work particularly well for history and science museums (who don’t have gorgeous images of art that tell their own story).
  • make use of social media video advertising (18 per cent vs. 32 per cent) – again, video is a more natural format for performing arts than for museums
  • use crowdfunding (8 per cent vs. 19 per cent) – crowdfunding requires a significant investment of time and is often limited to specific projects rather than core business expenses, so it might be seen as too risky, but is this why museums are less likely to try it?
  • livestream performances (2 per cent vs. 12 per cent) – again, this is less likely to apply to museums than performing arts organisations

One of the key messages in Ed Corn’s talk was that organisations are experimenting less, evaluating the impact of digital work less, and not using data in digital decision making. They’re also scaling back on non-core work; some are focusing on consolidation – fixing the basics like websites (and mobile-friendly sites). Barriers include lack of funding, lack of in-house time, lack of senior digital managers, slow/limited IT systems, and lack of digital supplier. (Many of those barriers were also listed in a small-scale survey on ‘issues facing museum technologists’ I ran in 2010.)

When you consider the impact of the cuts year on year since 2010, and that ‘one in five regional museums at least part closed in 2015‘, some of those continued barriers are less surprising. At one point everyone I know still in museums seemed to be doing at least one job on top of theirs, as people left and weren’t replaced. The cuts might have affected some departments more deeply than others – have many museums lost learning teams? I suspect we’ve also lost two generations of museum technologists – the retiring generation who first set up mainframe computers in basements, and the first generation of web-ish developers who moved on to other industries as conditions in the sector got more grim/good pay became more important. Fellow panelist Ros Lawler also made the point that museums have to deal with legacy systems while also trying to look at the future, and that museum projects tend to slow when they could be more agile.

Like many in the audience, I really wanted to know who the ‘digital leaders’ – the 10% of organisations who thought digital was important, did more digital activities and reaped the most benefits from their investment – were, and what made them so successful. What can other organisations learn from them?

It seems that we still need to find ways to share lessons learnt, and to help everyone in the arts and cultural sectors learn how to make the most of digital technologies and social media.  Training that meets the right need at the right time is really hard to organise and fund, and there are already lots of pockets of expertise within organisations – we need to get people talking to each other more! As I said at the event, most technology projects are really about people. Front of house staff, social media staff, collections staff – everyone can contribute something.

If you were there, have read the report or explored the data, I’d love to know what you think. And I’ll close with a blatant plug: the MCG has two open calls for papers a year, so please keep an eye out for those calls and suggest talks or volunteer to help out!

Exercises for ‘The basics of crowdsourcing in cultural heritage’

I’m running a workshop (at a Knowledge Exchange event organised by the Scottish Network on Digital Cultural Resources Evaluation and the Museums Galleries Scotland Digital Transformation Network) to help people get started with crowdsourcing in cultural heritage. These exercises are designed to give participants some hands-on experience with existing projects while developing their ability to discuss the elements of successful crowdsourcing projects. They are also an opportunity to appreciate the importance of design and text in marketing a project, and the role of user experience design in creating projects that attract and retain contributors.

Exercise: compare front pages

Choose two of the sites below to review.

The most important question to keep in mind is: how effective is the front page at making you want to participate in a project? How does it achieve that?

Exercise: try some crowdsourcing projects

Try one of the sites listed above; others are listed in this post; non-English language sites are listed here. You can also ask for suggestions!

Attributes to discuss include:

The overall ‘call to action’

  • Is the first step toward participating obvious?
  • Is the type of task, source material and output obvious?

Probable audience

  • Can you tell who the project wants to reach?
  • Does text relate to their motivations for starting, continuing?
  • How are they rewarded?
  • Are there any barriers to their participation?

Data input and data produced

  • What kinds of tasks create that data?
  • How are contributions validated?

How productive, successful does the site seem overall?

Exercise: lessons from game design

  • Go to http://git.io/2048
  • Spend 2 minutes trying it out
  • Did you understand what to do?
  • Did you want to keep playing?

Exercise: your plans

Some questions to help make ideas into reality:

  • Who already loves and/or uses your collections?
  • Which material needs what kind of work?
  • Do any existing platforms meet most of your needs?
  • What potential barriers could you turn into tasks?
  • How will you resource community interaction?
  • How would a project support your mission, engagement strategy and digitisation goals?

Digital curator at the British Library?!

Kings Library Tower, British Library
Kings Library Tower, British Library

I have a new job! I’m the newest Digital Curator at the British Library. That link takes you to a post on the BL blog for a bit more about what my job involves. If you’ve read any of my posts over the past couple of years, you’ll know that working to encourage digital scholarship is a pretty good fit for my research and teaching interests.

In other news, I passed my PhD viva! I’ve got a couple of minor corrections to fit in around work and various papers, and then my PhD is over! (Unless I decide to publish from my thesis, of course…)

My ‘Welcome’ notes for UKMW15 ‘Bridging Gaps, Making Connections’

I’m at the British Museum today for the Museums Computer Group‘s annual UK ‘Museums on the Web’ conference. UKMW15 has a packed line-up full of interesting presentations. As Chair of the MCG, I briefly introduced the event. My notes are below, in part to make sure that everyone who should be thanked is thanked! You can read a more polished version of this written with my Programme Committe Co-Chair Danny Birchall in a Guardian Culture Professionals article, ‘How digital tech can bridge gaps between museums and audiences‘.

Museums Computer Group logoUK Museums on the Web 2015: ‘Bridging Gaps, Making Connections’ #UKMW15

I’d like to start by thanking everyone who helped make today happen, and by asking the MCG Committee Members who are here today to stand up, so that you can chat to them, ideally even thank them, during the day. For those who don’t know us, the Museums Computer Group is a practitioner-lead group who work to connect, support and inspire anyone working in museum technology. (There are lots of ways to get involved – we’re electing new committee members at our AGM at lunchtime, and we will also be asking for people to host next year’s event at their museum or help organise a regional event.)

I’d particularly like to thank Ina Pruegel and Jennifer Ross, who coordinated the event, the MCG Committee members who did lots of work on the event (Andrew, Dafydd, Danny, Ivan, Jess, Kath, Mia, Rebecca, Rosie), and the Programme Committee members who reviewed presentation proposals sent in. They were: co-chairs: Danny Birchall and Mia Ridge, with Chris Michaels (British Museum), Claire Bailey Ross (Durham University), Gill Greaves (Arts Council England), Jenny Kidd (Cardiff University), Jessica Suess (Oxford University Museums), John Stack (Science Museum Group), Kim Plowright (Mildly Diverting), Matthew Cock (Vocal Eyes), Rachel Coldicutt (Friday), Sara Wajid (National Maritime Museum), Sharna Jackson (Hopster), Suse Cairns (Baltimore Museum of Art), Zak Mensah (Bristol Museums, Galleries & Archives).

And of course I’d like to thank the speakers and session chairs, the British Museum, Matt Caines at the Guardian, and in advance I’d like to thank all the tweets, bloggers and photographers who’ll help spread this event beyond the walls of this room.

Which brings me to the theme of the event, ‘Bridging Gaps, Making Connections’. We’ve been running UK Museums on the Web since 2001; last year our theme was ‘museums beyond the web’ in recognition that barriers between ‘web teams’ and ‘web projects’ and the rest of the organisation were breaking down. But it’s also apparent that the gap between tiny, small, and even medium-sized museums and the largest, best-funded museums meant that digital expertise and knowledge had not reached the entire sector. The government’s funding cuts and burnout mean that old museum hands have left, and some who replace them need time to translate their experience in other sectors into museums. Our critics and audiences are confused about what to expect, and museums are simultaneously criticised for investing too much in technologies that disrupt the traditional gallery and for being ‘dull and dusty’. Work is duplicated across museums, libraries, archives and other cultural organisations; academic and commercial projects sometimes seem to ignore the wealth of experience in the sector.

So today is about bridging those gaps, and about making new connections. (I’ve made my own steps in bridging gaps by joining the British Library as a Digital Curator.) We have a fabulous line-up representing the wealth and diversity of experience in museum technologies.

So take lots of notes to share with your colleagues. Use your time here to find people to collaborate with. Tweet widely. Ask MCG Committee members to introduce you to other people here. Let people with questions know they can post them on the MCG discussion list and connect with thousands of people working with museums and technology. Now, more than ever, an event like this isn’t about technology; it’s about connecting and inspiring people.

Who’s inspired me in 2015?

MargaretHamiltonIronically, the internet was down on the evening of Ada Lovelace Day 2015,  an annual, international ‘celebration of the achievements of women in science, technology, engineering and maths (STEM)’, so I couldn’t post at the time. Belatedly, the people whose achievements I’ve admired are:

Anna Powell-Smith, who has made lots of cool things like the first free online copy of the Domesday Book, a map of offshore land ownership and What Size Am I?, and also volunteers for mySociety.

Professor Monica Grady, whose joy when the probe Philae successfully landed on the Rosetta comet is just about the most wonderful thing on the internet (and she worked on one of the instruments on board, which is very cool). Like New Horizons sending back images of Pluto, it’s a reminder of the awe-inspiring combination of planning, foresight, science and engineering in space that has made 2015 so interesting.

Finally, I love this image of Margaret Hamilton, lead software engineer on Project Apollo (1969), with some of the Apollo Guidance Computer (AGC) source code.

How an ecosystem of machine learning and crowdsourcing could help you

Back in September last year I blogged about the implications for cultural heritage and digital humanities crowdsourcing projects that used simple tasks as the first step in public engagement of advances in machine learning that mean that fun, easy tasks like image tagging and text transcription could be done by computers. (Broadly speaking, ‘machine learning’ is a label for technologies that allow computers to learn from the data available to them. It means they don’t have to specifically programmed to know how to do a task like categorising images – they can learn from the material they’re given.) One reason I like crowdsourcing in cultural heritage so much is that time spent on simple tasks can provide opportunities for curiosity, help people find new research interests, and help them develop historical or scientific skills as they follow those interests. People can notice details that computers would overlook, and those moments of curiosity can drive all kinds of new inquiries. I concluded that, rather than taking the best tasks from human crowdsourcers, ‘human computation‘ systems that combine the capabilities of people and machines can free up our time for the harder tasks and more interesting questions.

I’ve been thinking about ‘ecosystems’ of crowdsourcing tasks since I worked on museum metadata games back in 2010. An ecosystem of tasks – for example, classifying images into broad types and topics in one workflow so that people can find text to transcribe on subjects they’re interested in, and marking up that text with relevant subjects in a final workflow – means that each task can be smaller (and thereby faster and more enjoyable). Other workflows might validate the classifications or transcribed text, allowing participants with different interests, motivations and time constraints to make meaningful contributions to a project. The New York Public Library’s Building Inspector is an excellent example of this – they offer five tasks (checking or fixing automatically-detected building ‘footprints’, entering street numbers, classifying colours or finding place names), each as tiny as possible, which together result in a complete set of checked and corrected building footprints and addresses. (They’ve also pre-processed the maps to find the building footprints so that most of the work has already been done before they asked people to help.)

Screenshot from NYPL's Building Inspector
Check building footprints in NYPL’s Building Inspector

After teaching ‘crowdsourcing cultural heritage’ at HILT over the summer, where the concept of ‘ecosystems’ of crowdsourced tasks was put into practice as we thought about combining classification-focused systems like Zooniverse’s Panoptes with full-text transcription systems, I thought it could be useful to give some specific examples of ecosystems for human computation in cultural heritage. If there are daunting data cleaning, preparation or validation tasks necessary before or after a core crowdsourcing task, computational ecosystems might be able to help. So how can computational ecosystems help pre- and post-process cultural heritage data for a better crowdsourcing experience?

While older ecosystems like Project Gutenberg and Distributed Proofreaders have been around for a while, we’re only just seeing the huge potential for combining people + machines into crowdsourcing ecosystems. The success of the Smithsonian Transcription Center points to the value of ‘niche’ mini-projects, but breaking vast repositories into smaller sets of items about particular topics, times or places also takes resources. Machines can learn to classify source material by topic, by type, by difficulty or any other system that crowdsourcers can teach it. You can improve machine learning by giving systems ‘ground truth’ datasets with (for example) a crowdsourced transcription of the text in images, and as Ted Underwood pointed out on my last post, comparing the performance of machine learning and crowdsourced transcriptions can provide useful benchmarks for the accuracy of each method. Small, easy correction tasks can help improve machine learning processes while producing cleaner data.

Computational ecosystems might be able to provide better data validation methods. Currently, tagging tasks often rely on raw consensus counts when deciding whether a tag is valid for a particular image. This is a pretty crude measure – while three non-specialists might apply terms like ‘steering’ to a picture of a ship, a sailor might enter ‘helm’, ’tiller’ or ‘wheelhouse’, but their terms would be discarded if no-one else enters them. Mining disciplinary-specific literature for relevant specialist terms, or finding other signals for subject-specific expertise would make more of that sailor’s knowledge.

Computational ecosystems can help at the personal, as well as the project level. One really exciting development is computational assistance during crowdsourcing tasks. In Transcribing Bentham … with the help of a machine?, Tim Causer discusses TSX, a new crowdsourced transcription platform from the Transcribe Bentham and tranScriptorium projects. You can correct computationally-generated handwritten text transcription (HTR), which is a big advance in itself. Most importantly, you can also request help if you get stuck transcribing a specific word. Previously, you’d have to find a friendly human to help with this task. And from here, it shouldn’t be too difficult to combine HTR with computational systems to give people individualised feedback on their transcriptions. The potential for helping people learn palaeography is huge! Better validation techniques would also improve the participants’ experience. Providing personalised feedback on the first tasks a participant completes would help reassure them while nudging them to improve weaker skills.

Most science and heritage projects working on human computation are very mindful of the impact of their choices on the participants’ experience. However, there’s a risk that anyone who treats human computation like a computer science problem (for example, computationally assigning tasks to the people with the best skills for them) will lose sight of the ‘human’ part of the project. Individual agency is important, and learning or mastering skills is an important motivation. Non-profit crowdsourcing should never feel like homework. We’re still learning about the best ways to design crowdsourcing tasks, and that job is only going to get more interesting.

 

 

The good, the bad, and the unstructured… Open data in cultural heritage

I was in London this week for the Linked Pasts event, where I presented on trends and practices for open data in cultural heritage. Linked Pasts was a colloquium put on by the Pelagios project (Leif Isaksen, Elton Barker and Rainer Simon with Pau de Soto). I really enjoyed the other papers, which included thoughtful, grounded approaches to structured data for historical periods, places and people,  recognition of the importance of designing projects around audience needs (including user research), the relationship between digital tools and scholarly inquiry, visualisations as research tools, and the importance of good infrastructure for digital history.

My talk notes are below the embedded slides.

 

Warning: generalisations ahead.

My discussion points are based on years of conversations with other cultural heritage technologists in museums, libraries, and archives, but inevitably I’ll have blind spots. For example, I’m focusing on the English-speaking world, which means I’m not discussing the great work that Dutch and Japanese organisations are doing. I’ve undoubtedly left out brilliant specific examples in the interests of focusing on broader trends.  The point is to start conversations, to bring issues out into the open so we can collectively decide how to move forward.

The good

The good news is that more and more open cultural data is being published. Organisations have figured out that a) nothing bad is likely to happen and that b) they might get some kudos for releasing open data.

Generally, organisations are publishing the data that they have to hand – this means it’s mostly collections data. This data is often as messy, incomplete and fuzzy as you’d expect from records created by many different people using many different systems over a hundred or more years.

…the bad…

Copyright restrictions mean that images mightn’t be included. Furthermore, because it’s often collections data, it’s not necessarily rich in interpretative information. It’s metadata rather than data. It doesn’t capture the scholarly debates, the uncertain attributions, the biases in collecting… It certainly doesn’t capture the experience of viewing the original object.

Licensing issues are still a concern. Until cultural organisations are rewarded by their funders for releasing open data, and funders free organisations from expectations for monetising data, there will be damaging uncertainty about the opportunity cost of open data.

Non-commercial licenses are also an issue – organisations and scholars might feel exploited if others who have not contributed to the process of creating it can commercially publish their work. Finally, attribution is an important currency for organisations and scholars but most open licences aren’t designed with that in mind.

…and the unstructured

The data that’s released is often pretty unstructured. CSV files are very easy to use, so they help more people get access to information (assuming they can figure out GitHub), but a giant dump like this doesn’t provide stable URIs for each object. Records in data dumps rarely link to external identifiers like the Getty’s Thesaurus of Geographic Names, Art & Architecture Thesaurus (AAT) or Union List of Artist Names, or vernacular sources for place and people names such as Geonames or DBPedia. And that’s fair enough, because people using a CSV file probably don’t want all the hassle of dereferencing each URI to grab the place name so they can visualise data on a map (or whatever they’re doing with the data). But it also means that it’s hard for someone to reliably look for matching artists in their database, and link these records with data from other organisations.

So it’s open, but it’s often not very linked. If we’re after a ‘digital ecosystem of online open materials’, this open data is only a baby step. But it’s often where cultural organisations finish their work.

Classics > Cultural Heritage?

But many others, particularly in the classical and ancient world, have managed to overcome these issues to publish and use linked open data. So why do museums, libraries and archives seem to struggle? I’ll suggest some possible reasons as conversation starters…

Not enough time

Organisations are often busy enough keeping their internal systems up and running, dealing with the needs of visitors in their physical venues, working on ecommerce and picture library systems…

Not enough skills

Cultural heritage technologists are often generalists, and apart from being too time-stretched to learn new technologies for the fun of it, they might not have the computational or information science skills necessary to implement the full linked data stack.

Some cultural heritage technologists argue that they don’t know of any developers who can negotiate the complexities of SPARQL endpoints, so why publish it? The complexity is multiplied when complex data models are used with complex (or at least, unfamiliar) technologies. For some, SPARQL puts the ‘end’ in ‘endpoint’, and ‘RDF triples‘ can seem like an abstraction too far. In these circumstances, the instruction to provide linked open data as RDF is a barrier they won’t cross.

But sometimes it feels as if some heritage technologists are unnecessarily allergic to complexity. Avoiding unnecessary complexity is useful, but progress can stall if they demand that everything remains simple enough for them to feel comfortable. Some technologists might benefit from working with people more used to thinking about structured data, such as cataloguers, registrars etc. Unfortunately, linked open data falls in the gap between the technical and the informatics silos that often exist in cultural organisations.

And organisations are also not yet using triples or structured data provided by other organisations. They’re publishing data in broadcast mode; it’s not yet a dialogue with other collections.

Not enough data

In a way, this is the collections documentation version of the technical barriers. If the data doesn’t already exist, it’s hard to publish. If it needs work to pull it out of different departments, or different individuals, who’s going to resource that work? Similarly, collections staff are unlikely to have time to map their data to CIDOC-CRM unless there’s a compelling reason to do so. (And some of the examples given might use cultural heritage collections but are a better fit with the work of researchers outside the institution than the institution’s own work).

It may be easier for some types of collections than others – art collections tend to be smaller and better described; natural history collections can link into international projects for structured data, and libraries can share cataloguing data. Classicists have also been able to get a critical mass of data together. Your local records office or small museum may have more heterogeneous collections, and there are fewer widely used ontologies or vocabularies for historical collections. The nature of historical collections means that ‘small ontologies, loosely joined’, may be more effective, but creating these, or mapping collections to them, is still a large piece of work. While there are tools for mapping to data structures like Europeana’s data model, it seems the reasons for doing so haven’t been convincing enough, so far. Which brings me to…

Not enough benefits

This is an important point, and an area the community hasn’t paid enough attention to in the past. Too many conversations have jumped straight to discussion about the specific standards to use, and not enough have been about the benefits for heritage audiences, scholars and organisations.

Many technologists – who are the ones making decisions about digital standards, alongside the collections people working on digitisation – are too far removed from the consumers of linked open data to see the benefits of it unless we show them real world needs.

There’s a cost in producing data for others, so it needs to be linked to the mission and goals of an organisation. Organisations are not generally able to prioritise the potential, future audiences who might benefit from tools someone else creates with linked open data when they have so many immediate problems to solve first.

While some cultural and historical organisations have done good work with linked open data, the purpose can sometimes seem rather academic. Linked data is not always explained so that the average, over-worked collections or digital team will that convinced by the benefits outweigh the financial and intellectual investment.

No-one’s drinking their own champagne

You don’t often hear of people beating on the door of a museum, library or archive asking for linked open data, and most organisations are yet to map their data to specific, widely-used vocabularies because they need to use them in their own work. If technologists in the cultural sector are isolated from people working with collections data and/or research questions, then it’s hard for them to appreciate the value of linked data for research projects.

The classical world has benefited from small communities of scholar-technologists – so they’re not only drinking their own champagne, they’re throwing parties. Smaller, more contained collections of sources and research questions helps create stronger connections and gives people a reason to link their sources. And as we’re learning throughout the day, community really helps motivate action.

(I know it’s normally called ‘eating your own dog food’ or ‘dogfooding’ but I’m vegetarian, so there.)

Linked open data isn’t built into collections management systems

Getting linked open data into collections management systems should mean that publishing linked data is an automatic part of sharing data online.

Chicken or the egg?

So it’s all a bit ‘chicken or the egg’ – will it stay that way? Until there’s a critical mass, probably. These conversations about linked open data in cultural heritage have been going around for years, but it also shows how far we’ve come.

[And if you’ve published open data from cultural heritage collections, linked open data on the classical or ancient world, or any other form of structured data about the past, please add it to the wiki page for museum, gallery, library and archive APIs and machine-readable data sources for open cultural data.]

Drink your own champagne! (Nasjonalbiblioteket image)
Drink your own champagne! (Nasjonalbiblioteket image)